[00:01] JAA: They're live again. https://dailystormer.al/ [00:07] Ha ha, white supremacists living under the pleasure of albania [00:10] I wonder if bitmitigate.com knows they are hosting them [00:11] Based on their twitter feed, I'm assuming so. https://twitter.com/bitmitigate [00:30] *** BlueMaxim has joined #archiveteam-bs [00:35] jrwr: I am willing to participate too. [00:37] Also, I am currently trying to get the stuff/knowledge to start some material/corrosion testing for long-term no-maintance tape storage. Just the mechanical side for now though. I expect about 15$/TB to store for 1-2 decades, in lots of about 50TB+ [00:38] Trying to get water vapor diffusion numbers for metal cans, which is not that easy. [01:28] *** drumstick has joined #archiveteam-bs [01:33] *** drumstick has quit IRC (Ping timeout: 268 seconds) [01:38] *** drumstick has joined #archiveteam-bs [01:44] plue: just usernames is worth putting up on IA just so it's less likely to get lost. [01:44] It's not the valuable historically without the actual data, agreed. [01:46] One thing we could do that would likely be a smaller pile of data, is to grab *just text* from all the blogs on that list. [01:46] I know the images on Tumblr are really important, but so is the text, and it's lots smaller. [01:46] Also, making a warrior project just to grab metadata from all those blogs would be nice. [01:46] i.e. how many posts, what's the oldest and newest, etc. [01:47] It would also be *great* if knowledge of writing warrior projects was more widely distributed, so if any of you want to practice... [02:07] Somebody2: I do approve a lot of the effort to distribute the knowledge of writing warrior projects [02:09] Do we have _any_ usefull estimate on the size of the combined size of the tumblr image? We are kind-of freaked out by our expectations, so we have not seriously tried to start archiving at all... [02:14] We are currently archiving a selection of 4chan boards though, and will likely soon have 8chan ready too, but those archives are a bit of an issue due to them being compiled live, i.e. before moderation, and if you know 4chan there is a lot of stuff you do not want to host. Moderation filters most of that out, but in the spirit of archiving we prefer to not restric ourselves to censored records of [02:14] history. Personally I would decide against censorship in a binary decision (on current levels of censorship, i.e. I'd prefer to just get rid of all censorship than keep the present state), but that is hardly something really to consider when you speak of the huge amount stored by tumblr. [02:15] I don't think that's so much an issue with Tumblr, esspecially by now, considering the censorship Verizon has started putting in. [02:16] Yeah. Much less there. [02:16] As for total size -- you can get some estimates by looking at the sizes of the many tumblr blogs we've grabbed with Archivebot. [02:16] then multiply by the number in the list. [02:16] Do you have numbers? [02:17] Not offhand -- but the Archivebot data is all available on IA, so you can figure it out there. [02:17] Oh ok [02:17] If we want to discuss this further, we should probably make a separate channel; name suggestions? [02:17] * Somebody2 is going AFK for a bit [02:18] For storage estimates I am only really concerned that the accuracy is so that if you say 10TB, it might also be 2.5 or 40, but in that area. [02:18] *** wabu has joined #archiveteam-bs [02:18] tumbleweed? [02:19] But tbh last name suggestion i made a few months ago was pretty terrible, according to the others. [02:23] Considering the scale to be a number of bits needed to count the bytes storead, i.e. 32 being 4GiB, 40 being 1TiB, anything up to 38 is no issue, likely to be more work to manage than to store, 44 no issue to store in a potentail cold storage, and 55 about the limit the warrior system can reasonably do for a single project. Anything above 36 is not necessarily for the internet archive though. [02:23] At least not yet. [02:25] tumbleweed seems fine to me [02:26] but it's already in use :-( [02:26] Uh [02:26] Any better idea= [02:27] tumbledown? [02:27] yeah [02:27] feel free to join it [02:41] namibj_: what are you going on about with this storage system [02:41] We have newsbuddy eating that on a daily basis now [02:42] jrwr: *How* much is newsbuddy producing per day? [02:42] I handled 16 million urls yesterday [02:42] on the dedupe server [02:43] 38TB per the tracker http://tracker.archiveteam.org/newsgrabber/ [02:44] mind you, we know that tumblr will be a stupid large project if we ever need to save them [02:46] Cool, nice to have the stats. [02:47] *** namibj_ has quit IRC (Ping timeout: 268 seconds) [02:47] *** namibj1 has joined #archiveteam-bs [02:48] Somebody2: sorry for the delay, my bouncer has some connectivity issues. [02:53] Ya [02:53] If you join #newsgrabberbot [02:53] you can see as the URLs are added in [02:56] *** Fletcher- is now known as Fletcher_ [03:07] *** Kisikilli has quit IRC (Quit: http://www.okay.uz/ (Ping timeout)) [03:24] namibj1: no worries, it happens [03:27] *** Fletcher is now known as Fletcher- [03:27] *** Fletcher_ is now known as Fletcher [03:52] *** qw3rty115 has joined #archiveteam-bs [03:56] *** qw3rty114 has quit IRC (Read error: Operation timed out) [04:09] *** HarryCros has quit IRC (Read error: Connection reset by peer) [04:09] *** HarryCros has joined #archiveteam-bs [04:11] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:18] *** Sk1d has joined #archiveteam-bs [04:40] *** namibj_ has joined #archiveteam-bs [05:51] *** klg has joined #archiveteam-bs [06:07] *** schbirid has joined #archiveteam-bs [06:10] *** brayden has quit IRC (Read error: Connection reset by peer) [06:14] JAA: I'm not sure where this twitter feed is getting info from, but slightly concerning. https://twitter.com/dnsstream/status/902035519776858112 [06:18] huh. .al domains don't have a central whois server. [06:23] did anyone look at pressreader to get more recent newspapers [06:23] recent as in the last 10+ years [06:32] *** Mateon1 has quit IRC (Ping timeout: 245 seconds) [06:32] *** Mateon1 has joined #archiveteam-bs [07:01] *** brayden has joined #archiveteam-bs [07:01] *** swebb sets mode: +o brayden [07:01] i think issuu.com is screwed up [07:26] jrwr: sorry to burst your bubble - that 36TB is since we moved to the warrior. We do about 1TB a day [07:30] *** schbirid has quit IRC (Quit: Leaving) [07:37] *** atluxity has quit IRC (Ping timeout: 506 seconds) [07:40] looks like issuu.com is fixed [08:26] *** drumstick has quit IRC (Read error: Operation timed out) [08:39] *** drumstick has joined #archiveteam-bs [08:51] *** JerryStie has quit IRC (Read error: Operation timed out) [09:24] *** SimpBrain has quit IRC (Remote host closed the connection) [09:47] *** ruunyan has quit IRC (Read error: Operation timed out) [09:49] *** ruunyan has joined #archiveteam-bs [10:48] *** BlueMaxim has quit IRC (Quit: Leaving) [11:36] godane, finally coming in but he still wants to clean them up before putting them out properly, here's what he's handed me so far and wants feedback on cleaning up the video before encoding/putting them out: http://dheval.eieidoh.net:8880/DataHoarder/tmp_requests/rednight39_Conan_VHS/ [11:40] *** drumstick has quit IRC (Read error: Operation timed out) [11:43] some scene stuff AJJ-The_Bible_2-CD-FLAC-2016-NBFLAC [11:44] anybody collecting soft for BeOS? [11:46] odemg: i'm downloading it now [11:47] <3 [11:47] i don't know much about video capturing and encoding but i do my best [11:48] he's all good with that he just wants to restore the video as much as possible [11:49] odemg: do you know how to get pressreader newspapers by chance? [12:18] godane, not tried, no idea what those are so nope? [12:25] godane, if you're not on the tracker you may be interested in this, though it's taking forever to derive: https://archive.org/details/societyglitchaugust2017 [12:29] it would be nice to have it sorted but it's huge task, I've also saved all he original upload descriptions some of which are quite crazy... example: https://www.reddit.com/r/opendirectories/comments/6waf8b/floyd_mayweather_vs_conor_mcgregor/dm6kaiy/?context=3 [13:22] what was the name for the tumblr channel again? I got DDOSd, and my bouncer isweired. [13:23] namibj_: #tumbledown [13:34] thx [15:13] *** kristian_ has joined #archiveteam-bs [15:15] "Wayback Machine failed to return archive information." and "This snapshot cannot be displayed due to an internal error." :-| [15:19] *** HCross has joined #archiveteam-bs [15:22] *** HarryCros has quit IRC (Ping timeout: 268 seconds) [16:00] *** kristian_ has quit IRC (Quit: Leaving) [16:10] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [16:13] *** Lord_Nigh has joined #archiveteam-bs [16:34] *** schbirid has joined #archiveteam-bs [16:43] do people actually like old commercials? [16:51] atrocity: sometimes. more a s a curiosity effet/looking at stuff in a museum, than for afternoion tv binge replacement, afaik [17:05] *** robink has quit IRC (Read error: Connection reset by peer) [17:09] *** robink has joined #archiveteam-bs [17:23] *** RichardG has quit IRC (Read error: Connection reset by peer) [17:25] *** RichardG has joined #archiveteam-bs [19:24] *** Asparagir has joined #archiveteam-bs [19:47] haha, ok. i have a bunch of old tapes i have from...sources...that has old 80's and 90's commercials on them [20:11] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:13] JAA: I saw someone post about that on Twitter as well. [20:16] *** RichardG has joined #archiveteam-bs [20:19] *** RichardG_ has joined #archiveteam-bs [20:20] *** fie has quit IRC (Ping timeout: 268 seconds) [20:23] *** RichardG has quit IRC (Ping timeout: 370 seconds) [20:23] I wish the Community College near me had something like this, except less expensive. http://digitalcuration.umaine.edu/ [20:24] *** RichardG has joined #archiveteam-bs [20:26] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:29] *** RichardG_ has quit IRC (Ping timeout: 370 seconds) [20:34] *** fie has joined #archiveteam-bs [20:45] odemg: the conan videos look great [20:46] i don't think filters would be needed [20:48] yeah i didnt think so [20:58] *** kimmer has joined #archiveteam-bs [21:23] *** sep332_ has joined #archiveteam-bs [21:24] *** sep332 has quit IRC (Read error: Operation timed out) [21:32] atrocity: do you have h/w to digitize them without compression? I.e. read the video signal into raw pixels and then just store them for later non-real-time compression? If not, we could brobably find soem way to digitize them well. May I ask, what country they are in? [21:33] i have an ntsc-firewire capture thingy that encodes each frame as a distinct jpeg [21:33] it looks a lot better than mpeg [21:33] but the files are yooge [21:34] *** RichardG has joined #archiveteam-bs [21:44] Good [21:44] You encode later with ffmpeg/libx264 [21:45] astrid: I can guide you through the encoding and creation of the files for upload. [21:46] excuse me? [21:46] If you have a way to feed the vhs into your computer and store the files for a w2eek or so abbout. [21:46] i don't have the tapes [21:46] you probably are talking to the wrong person [21:46] Oh [21:47] sry, I thought you responded. [21:47] i responded that such hardware exists [21:48] that's all :) [21:48] Uh I know it exists [21:48] I just don't know if he has access. [21:48] * astrid nods quietly [21:49] * namibj1 knows a bit or two about video coding [21:49] sorrz [21:50] Nah, I don't have that much experience with old tech. [22:07] *** drumstick has joined #archiveteam-bs [22:23] *** Geekonoci has joined #archiveteam-bs [22:33] *** schbirid has quit IRC (Quit: Leaving) [23:04] i meant they were ripped tapes [23:04] from a private tracker [23:04] so whatever format they're in (probably shitty mpeg) [23:04] ah nice [23:04] like old nick shows and stuff with commercials in between and stuff [23:07] Uh [23:07] So the stuff you can access is already digital? [23:07] Do you have a way to upload them? [23:51] this is kind of scary: https://tech.slashdot.org/story/17/08/28/1725232/how-the-nsa-identified-satoshi-nakamoto