[00:29] *** vitzli has joined #archiveteam [00:31] *** [phire] has quit IRC (Quit: ZNC - http://znc.in) [00:32] *** [phire] has joined #archiveteam [00:35] *** WinterFox has joined #archiveteam [00:37] *** yipdw has quit IRC (Ping timeout: 506 seconds) [00:47] *** zhongfu has joined #archiveteam [00:56] *** yipdw has joined #archiveteam [01:00] *** Ghost_of_ has quit IRC (Quit: Leaving) [01:05] *** nertzy2 has joined #archiveteam [01:18] *** vitzli has quit IRC (Leaving) [01:30] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [01:37] *** JesseW has joined #archiveteam [01:38] *** megaminxw has joined #archiveteam [01:39] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [02:24] *** schbirid2 has joined #archiveteam [02:25] I let gamefront go. It's now uploading 3.2tb [02:25] *** schbirid has quit IRC (Read error: Operation timed out) [02:36] *** JesseW has quit IRC (Leaving.) [03:03] *** JesseW has joined #archiveteam [03:10] *** fie has joined #archiveteam [03:47] *** megaminxw has quit IRC (Quit: Leaving.) [04:15] New IA census data grabbed, totaling 34 GB. Now extracting the identifiers from it (should take about half an hour), to find out which ones didn't come through. [04:17] Then I'll extract just the hashes, and then I'll have to figure out how to sort a ~15G file, and diff it against another one... [04:35] *** BlueMaxim has joined #archiveteam [04:54] Can probably do it with sort on a box with >15GB ram [05:03] JesseW: GNU sort with -T [05:03] that instructs sort to use temporary files and merge them [05:03] you don't need gobs of RAM at that point, just enough storage [05:04] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [05:08] cool; I also heard about -S and --parallel to adjust that [05:16] *** Muad-Dib has joined #archiveteam [05:53] *** dcmorton has joined #archiveteam [05:53] *** dcmorton has quit IRC (Excess Flood) [05:53] *** dcmorton has joined #archiveteam [06:43] *** vitzli has joined #archiveteam [07:02] *** megaminxw has joined #archiveteam [07:19] *** megaminxw has quit IRC (Ping timeout: 633 seconds) [07:23] *** megaminxw has joined #archiveteam [07:39] *** vitzli has quit IRC (Leaving) [07:51] *** JesseW has quit IRC (Leaving.) [08:19] *** atomotic has joined #archiveteam [08:20] *** atomotic has quit IRC (Client Quit) [08:31] *** atomotic has joined #archiveteam [08:58] *** terburg has joined #archiveteam [09:22] *** terburg_ has joined #archiveteam [09:26] *** terburg has quit IRC (Ping timeout: 499 seconds) [09:26] *** terburg_ is now known as terburg [09:43] *** terburg has quit IRC (Ping timeout: 258 seconds) [10:21] Working on this: http://fos.textfiles.com/MENU/ [11:02] *** SilSte has quit IRC (Quit: No Ping reply in 180 seconds.) [11:02] *** SilSte has joined #archiveteam [11:02] *** Kazzy has quit IRC (Ping timeout: 260 seconds) [11:04] *** Kazzy has joined #archiveteam [11:11] *** VADemon has joined #archiveteam [12:12] *** arkiver3 has joined #archiveteam [12:22] *** WinterFox has quit IRC (Remote host closed the connection) [12:24] *** WinterFox has joined #archiveteam [12:33] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [13:06] *** arkiver3 has joined #archiveteam [13:07] *** megaminxw has quit IRC (Quit: Leaving.) [13:10] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [13:41] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [13:59] *** nertzy2 has joined #archiveteam [13:59] *** BlueMaxim has quit IRC (Quit: Leaving) [14:19] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [14:24] *** nertzy2 has joined #archiveteam [14:29] *** WinterFox has quit IRC (Remote host closed the connection) [14:35] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [14:40] *** Morbus has quit IRC (http://www.disobey.com/) [14:57] *** VADemon has quit IRC (Quit: left4dead) [15:17] *** K4k has joined #archiveteam [15:32] *** Froggypwn has quit IRC (Read error: Operation timed out) [15:54] *** K4k_ has joined #archiveteam [15:54] *** K4k_ has quit IRC (Remote host closed the connection!) [15:54] *** K4k has quit IRC (Read error: Connection reset by peer) [15:56] *** K4k_ has joined #archiveteam [15:56] *** K4k_ has quit IRC (Connection closed) [15:56] *** K4k_ has joined #archiveteam [16:11] *** VADemon has joined #archiveteam [16:13] *** Froggypwn has joined #archiveteam [16:26] *** Froggypwn has quit IRC (Ping timeout: 300 seconds) [16:29] *** Froggypwn has joined #archiveteam [16:44] *** K4k_ has quit IRC (Read error: Connection reset by peer) [16:45] *** K4k_ has joined #archiveteam [17:01] *** JesseW has joined #archiveteam [17:17] Anyone got any more news sites to add to our Newsgrabber? [17:17] *** lbft has quit IRC (Read error: Operation timed out) [17:17] *** lbft has joined #archiveteam [17:24] *** JesseW has quit IRC (Leaving.) [17:25] *** lbft has quit IRC (Read error: Operation timed out) [17:26] where is the current list? [17:26] 18<schbirid218> where is the current list? [17:27] ooops [17:27] http://newsgrabber.harrycross.me/services.html [17:27] *** lbft has joined #archiveteam [17:33] *** nertzy2 has joined #archiveteam [17:48] HCross: you just need feed urls? [17:51] either RSS or the "main news" page [17:52] *** VADemon_ has joined #archiveteam [17:53] *** VADemon_ has quit IRC (Read error: Connection reset by peer) [17:54] *** VADemon_ has joined #archiveteam [17:55] *** VADemon has quit IRC (Read error: Operation timed out) [17:59] *** VADemon_ has quit IRC (Read error: Connection reset by peer) [18:00] *** VADemon has joined #archiveteam [18:00] If the site has multiple RSS feeds with different articles or has some other pages with articles only linked to from those pages, then please add the multiple RSS feads and the different non-RSS pages [18:02] nrk.no does that... its a mess [18:04] *** VADemon has quit IRC (Read error: Connection reset by peer) [18:04] *** VADemon has joined #archiveteam [18:07] HCross: https://pastee.org/g3vrd [18:07] i did not do all the feeds for at least focus.de and standard.at because they are messy and spammy (focus) [18:07] ah. arkiver could we script these? [18:08] any chance that this might mutate into a distributed archive of news? without images and videos it should be reasonably small [18:09] part of the aim is to get images and videos [18:12] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [18:18] *** golu has joined #archiveteam [18:23] *** golu has quit IRC (Ping timeout: 241 seconds) [18:25] *** nertzy2 has joined #archiveteam [18:58] *** Froggypwn has quit IRC (Read error: Operation timed out) [19:11] *** nertzy2 has quit IRC (Quit: This computer has gone to sleep) [19:12] *** K4k_ has quit IRC (Read error: Operation timed out) [19:14] *** JetBalsa has joined #archiveteam [19:28] so IBM have brought ustream... [19:41] Yeah, weird. [19:41] $150 too [19:41] Bargain [19:47] *** K4k_ has joined #archiveteam [19:52] *** scyther has joined #archiveteam [19:53] *** K4k_ has quit IRC (Read error: Operation timed out) [19:54] *** q3g345 has joined #archiveteam [19:54] *** q3g345 has quit IRC (Client Quit) [20:27] *** K4k_ has joined #archiveteam [20:43] NewsGrabber is now covering 16 algerian newssites [20:43] Will be many more tomorrow [20:46] *** maseck_ has quit IRC (Remote host closed the connection) [21:03] Great. [21:03] Is this going into the archive yet? [21:03] yes [21:03] https://archive.org/search.php?query=newssites&sort=-publicdate&page=1 [21:04] They are indexed by the wayback machine after a month or so, on page 12 at the bottom are some indexed items https://archive.org/search.php?query=newssites&sort=-publicdate&page=12 [21:07] Nice nice [21:07] *** schbirid2 has quit IRC (Quit: Leaving) [21:11] Whose script where is doing this [21:11] I have requests. [21:12] its me and arkiver doing it all [21:12] Yes, I wrote the scripts and HCross is taking care of the server [21:12] I make sure the server doesnt catch fire, arkiver writes the code [21:13] code is here https://github.com/ArchiveTeam/NewsGrabber [21:13] With a readme how to add new newssites [21:13] http://newsgrabber.harrycross.me:29000 you can see what its doing here [21:13] All supported newssites are here https://github.com/ArchiveTeam/NewsGrabber/tree/master/services [21:20] *** oli has quit IRC (Read error: Operation timed out) [21:21] *** oli has joined #archiveteam [21:25] HCross: You now have access to archiveteam_newssites. Please upload all future packs there. [21:25] ok - thanks. arkiver is that something you need to script in? [21:26] I gave the same access to the same account that has been doing the uploading. [21:26] ah cool, thats my account [21:27] I have put all 567 items you uploaded to opensource into its own collection. [21:27] Next, please make it so the title of the item is set thus: [21:28] Archive Team Newsgrab: (Number) [21:28] So instead of the title being archiveteam_newssites_20160121_0021 [21:28] It should be Archive Team Newsgrab: 20160121_0021 [21:28] This is different than the itemname. [21:28] Itemnames are fine. [21:29] Ok. arkiver and I need to bash out a script update for that [21:29] Additionally, they should really have a blurb. [21:30] Frankly, they should be given some blurb text as well. [21:30] This is not difficult stuff. [21:30] I'm going to go fix the 567 now. [21:30] Yea, easy to add to the script [21:37] *** VADemon has quit IRC (Read error: No route to host) [21:38] Almost done renaming your 567 items [21:38] thanks [21:39] As blurb just "Newsarticles grabbed by Archive Team."? [21:39] (in description) [21:40] A collection of news articles grabbed from a wide variety of sources around the world automatically by Archive Team scripts. [21:40] *** VADemon has joined #archiveteam [21:41] They're all getting the description right now (the 567). use that going forward. [21:41] You'd be so proud of me for using "around the world" instead of "foreign" [21:45] Speaking of news, I've now ISO'd a PILE of "Front Page News" CD-ROMs, which are text grabs of news stories going back to 1983 [21:47] Nice [21:47] Awesome work you're doing on those CDs/DVDs [21:48] This item has everything ok right? https://archive.org/details/archiveteam_newssites_20160121_0022 [21:51] *** WinterFox has joined #archiveteam [21:52] *** wickedpla is now known as wp494 [22:06] SketchCow: you should know about this: https://archive.org/details/john-legere-twitter-periscope-video-2015-01-07 [22:06] *** K4k_ has quit IRC (Read error: Operation timed out) [22:06] i got all of the John Legere videos from 2015-01-07 twitter/periscope [22:21] *** VADemon has quit IRC (Read error: Connection reset by peer) [22:21] *** Ghost_of_ has joined #archiveteam [22:22] *** maseck has joined #archiveteam [22:25] *** xmc is now known as chronomex [22:25] *** chronomex is now known as xmc [22:30] *** scyther has quit IRC (Read error: Connection reset by peer) [22:34] Great [22:35] arkiver: Yes, that's what I was looking for. [22:35] *** megaminxw has joined #archiveteam [22:35] Good. Ill add it now [22:37] *** megaminxw has quit IRC (Read error: Connection reset by peer) [22:38] *** megaminxw has joined #archiveteam [23:07] Just updated and it doesnt seem the script is working properly, just waiting on an update [23:40] ^ problem fixed