SketchCow: Ha ha, white supremacists living under the pleasure of albania jrwr: I wonder if bitmitigate.com knows they are hosting them hook54321: Based on their twitter feed, I'm assuming so. https://twitter.com/bitmitigate ***: BlueMaxim has joined #archiveteam-bs namibj_: jrwr: I am willing to participate too.
Also, I am currently trying to get the stuff/knowledge to start some material/corrosion testing for long-term no-maintance tape storage. Just the mechanical side for now though. I expect about 15$/TB to store for 1-2 decades, in lots of about 50TB+
Trying to get water vapor diffusion numbers for metal cans, which is not that easy. ***: drumstick has joined #archiveteam-bs
drumstick has quit IRC (Ping timeout: 268 seconds)
drumstick has joined #archiveteam-bs Somebody2: plue: just usernames is worth putting up on IA just so it's less likely to get lost.
It's not the valuable historically without the actual data, agreed.
One thing we could do that would likely be a smaller pile of data, is to grab *just text* from all the blogs on that list.
I know the images on Tumblr are really important, but so is the text, and it's lots smaller.
Also, making a warrior project just to grab metadata from all those blogs would be nice.
i.e. how many posts, what's the oldest and newest, etc.
It would also be *great* if knowledge of writing warrior projects was more widely distributed, so if any of you want to practice... namibj_: Somebody2: I do approve a lot of the effort to distribute the knowledge of writing warrior projects
Do we have _any_ usefull estimate on the size of the combined size of the tumblr image? We are kind-of freaked out by our expectations, so we have not seriously tried to start archiving at all...
We are currently archiving a selection of 4chan boards though, and will likely soon have 8chan ready too, but those archives are a bit of an issue due to them being compiled live, i.e. before moderation, and if you know 4chan there is a lot of stuff you do not want to host. Moderation filters most of that out, but in the spirit of archiving we prefer to not restric ourselves to censored records of
history. Personally I would decide against censorship in a binary decision (on current levels of censorship, i.e. I'd prefer to just get rid of all censorship than keep the present state), but that is hardly something really to consider when you speak of the huge amount stored by tumblr. Somebody2: I don't think that's so much an issue with Tumblr, esspecially by now, considering the censorship Verizon has started putting in. namibj_: Yeah. Much less there. Somebody2: As for total size -- you can get some estimates by looking at the sizes of the many tumblr blogs we've grabbed with Archivebot.
then multiply by the number in the list. namibj_: Do you have numbers? Somebody2: Not offhand -- but the Archivebot data is all available on IA, so you can figure it out there. namibj_: Oh ok Somebody2: If we want to discuss this further, we should probably make a separate channel; name suggestions? -: Somebody2 is going AFK for a bit namibj_: For storage estimates I am only really concerned that the accuracy is so that if you say 10TB, it might also be 2.5 or 40, but in that area. ***: wabu has joined #archiveteam-bs namibj_: tumbleweed?
But tbh last name suggestion i made a few months ago was pretty terrible, according to the others.
Considering the scale to be a number of bits needed to count the bytes storead, i.e. 32 being 4GiB, 40 being 1TiB, anything up to 38 is no issue, likely to be more work to manage than to store, 44 no issue to store in a potentail cold storage, and 55 about the limit the warrior system can reasonably do for a single project. Anything above 36 is not necessarily for the internet archive though.
At least not yet. Somebody2: tumbleweed seems fine to me
but it's already in use :-( namibj_: Uh
Any better idea= Somebody2: tumbledown? namibj_: yeah Somebody2: feel free to join it jrwr: namibj_: what are you going on about with this storage system
We have newsbuddy eating that on a daily basis now Somebody2: jrwr: *How* much is newsbuddy producing per day? jrwr: I handled 16 million urls yesterday
on the dedupe server
38TB per the tracker http://tracker.archiveteam.org/newsgrabber/
mind you, we know that tumblr will be a stupid large project if we ever need to save them Somebody2: Cool, nice to have the stats. ***: namibj_ has quit IRC (Ping timeout: 268 seconds)
namibj1 has joined #archiveteam-bs namibj1: Somebody2: sorry for the delay, my bouncer has some connectivity issues. jrwr: Ya
If you join #newsgrabberbot
you can see as the URLs are added in ***: Fletcher- is now known as Fletcher_
Kisikilli has quit IRC (Quit: http://www.okay.uz/ (Ping timeout)) Somebody2: namibj1: no worries, it happens ***: Fletcher is now known as Fletcher-
Fletcher_ is now known as Fletcher
qw3rty115 has joined #archiveteam-bs
qw3rty114 has quit IRC (Read error: Operation timed out)
HarryCros has quit IRC (Read error: Connection reset by peer)
HarryCros has joined #archiveteam-bs
Sk1d has quit IRC (Ping timeout: 250 seconds)
Sk1d has joined #archiveteam-bs
namibj_ has joined #archiveteam-bs
klg has joined #archiveteam-bs
schbirid has joined #archiveteam-bs
brayden has quit IRC (Read error: Connection reset by peer) hook54321: JAA: I'm not sure where this twitter feed is getting info from, but slightly concerning. https://twitter.com/dnsstream/status/902035519776858112
huh. .al domains don't have a central whois server. godane: did anyone look at pressreader to get more recent newspapers
recent as in the last 10+ years ***: Mateon1 has quit IRC (Ping timeout: 245 seconds)
Mateon1 has joined #archiveteam-bs
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden godane: i think issuu.com is screwed up HCross2: jrwr: sorry to burst your bubble - that 36TB is since we moved to the warrior. We do about 1TB a day ***: schbirid has quit IRC (Quit: Leaving)
atluxity has quit IRC (Ping timeout: 506 seconds) godane: looks like issuu.com is fixed ***: drumstick has quit IRC (Read error: Operation timed out)
drumstick has joined #archiveteam-bs
JerryStie has quit IRC (Read error: Operation timed out)
SimpBrain has quit IRC (Remote host closed the connection)
ruunyan has quit IRC (Read error: Operation timed out)
ruunyan has joined #archiveteam-bs
BlueMaxim has quit IRC (Quit: Leaving) odemg: godane, finally coming in but he still wants to clean them up before putting them out properly, here's what he's handed me so far and wants feedback on cleaning up the video before encoding/putting them out: http://dheval.eieidoh.net:8880/DataHoarder/tmp_requests/rednight39_Conan_VHS/ ***: drumstick has quit IRC (Read error: Operation timed out) REiN^: some scene stuff AJJ-The_Bible_2-CD-FLAC-2016-NBFLAC
anybody collecting soft for BeOS? godane: odemg: i'm downloading it now odemg: <3 godane: i don't know much about video capturing and encoding but i do my best odemg: he's all good with that he just wants to restore the video as much as possible godane: odemg: do you know how to get pressreader newspapers by chance? odemg: godane, not tried, no idea what those are so nope?
godane, if you're not on the tracker you may be interested in this, though it's taking forever to derive: https://archive.org/details/societyglitchaugust2017
it would be nice to have it sorted but it's huge task, I've also saved all he original upload descriptions some of which are quite crazy... example: https://www.reddit.com/r/opendirectories/comments/6waf8b/floyd_mayweather_vs_conor_mcgregor/dm6kaiy/?context=3 namibj_: what was the name for the tumblr channel again? I got DDOSd, and my bouncer isweired. PurpleSym: namibj_: #tumbledown namibj_: thx ***: kristian_ has joined #archiveteam-bs JAA: "Wayback Machine failed to return archive information." and "This snapshot cannot be displayed due to an internal error." :-| ***: HCross has joined #archiveteam-bs
HarryCros has quit IRC (Ping timeout: 268 seconds)
kristian_ has quit IRC (Quit: Leaving)
Lord_Nigh has quit IRC (Read error: Operation timed out)
Lord_Nigh has joined #archiveteam-bs
schbirid has joined #archiveteam-bs atrocity: do people actually like old commercials? namibj_: atrocity: sometimes. more a s a curiosity effet/looking at stuff in a museum, than for afternoion tv binge replacement, afaik ***: robink has quit IRC (Read error: Connection reset by peer)
robink has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam-bs
Asparagir has joined #archiveteam-bs atrocity: haha, ok. i have a bunch of old tapes i have from...sources...that has old 80's and 90's commercials on them ***: RichardG has quit IRC (Read error: Connection reset by peer) hook54321: JAA: I saw someone post about that on Twitter as well. ***: RichardG has joined #archiveteam-bs
RichardG_ has joined #archiveteam-bs
fie has quit IRC (Ping timeout: 268 seconds)
RichardG has quit IRC (Ping timeout: 370 seconds) hook54321: I wish the Community College near me had something like this, except less expensive. http://digitalcuration.umaine.edu/ ***: RichardG has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG_ has quit IRC (Ping timeout: 370 seconds)
fie has joined #archiveteam-bs godane: odemg: the conan videos look great
i don't think filters would be needed odemg: yeah i didnt think so ***: kimmer has joined #archiveteam-bs
sep332_ has joined #archiveteam-bs
sep332 has quit IRC (Read error: Operation timed out) namibj1: atrocity: do you have h/w to digitize them without compression? I.e. read the video signal into raw pixels and then just store them for later non-real-time compression? If not, we could brobably find soem way to digitize them well. May I ask, what country they are in? astrid: i have an ntsc-firewire capture thingy that encodes each frame as a distinct jpeg
it looks a lot better than mpeg
but the files are yooge ***: RichardG has joined #archiveteam-bs namibj1: Good
You encode later with ffmpeg/libx264
astrid: I can guide you through the encoding and creation of the files for upload. astrid: excuse me? namibj1: If you have a way to feed the vhs into your computer and store the files for a w2eek or so abbout. astrid: i don't have the tapes
you probably are talking to the wrong person namibj1: Oh
sry, I thought you responded. astrid: i responded that such hardware exists
that's all :) namibj1: Uh I know it exists
I just don't know if he has access. -: astrid nods quietly
namibj1 knows a bit or two about video coding astrid: sorrz namibj1: Nah, I don't have that much experience with old tech. ***: drumstick has joined #archiveteam-bs
Geekonoci has joined #archiveteam-bs
schbirid has quit IRC (Quit: Leaving) atrocity: i meant they were ripped tapes
from a private tracker
so whatever format they're in (probably shitty mpeg) astrid: ah nice atrocity: like old nick shows and stuff with commercials in between and stuff namibj1: Uh
So the stuff you can access is already digital?
Do you have a way to upload them? atrocity: this is kind of scary: https://tech.slashdot.org/story/17/08/28/1725232/how-the-nsa-identified-satoshi-nakamoto