#archiveteam 2016-01-21,Thu

↑back Search

Time Nickname Message
00:29 🔗 vitzli has joined #archiveteam
00:31 🔗 [phire] has quit IRC (Quit: ZNC - http://znc.in)
00:32 🔗 [phire] has joined #archiveteam
00:35 🔗 WinterFox has joined #archiveteam
00:37 🔗 yipdw has quit IRC (Ping timeout: 506 seconds)
00:47 🔗 zhongfu has joined #archiveteam
00:56 🔗 yipdw has joined #archiveteam
01:00 🔗 Ghost_of_ has quit IRC (Quit: Leaving)
01:05 🔗 nertzy2 has joined #archiveteam
01:18 🔗 vitzli has quit IRC (Leaving)
01:30 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
01:37 🔗 JesseW has joined #archiveteam
01:38 🔗 megaminxw has joined #archiveteam
01:39 🔗 nertzy2 has quit IRC (Quit: This computer has gone to sleep)
02:24 🔗 schbirid2 has joined #archiveteam
02:25 🔗 SketchCow I let gamefront go. It's now uploading 3.2tb
02:25 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:36 🔗 JesseW has quit IRC (Leaving.)
03:03 🔗 JesseW has joined #archiveteam
03:10 🔗 fie has joined #archiveteam
03:47 🔗 megaminxw has quit IRC (Quit: Leaving.)
04:15 🔗 JesseW New IA census data grabbed, totaling 34 GB. Now extracting the identifiers from it (should take about half an hour), to find out which ones didn't come through.
04:17 🔗 JesseW Then I'll extract just the hashes, and then I'll have to figure out how to sort a ~15G file, and diff it against another one...
04:35 🔗 BlueMaxim has joined #archiveteam
04:54 🔗 tobbez Can probably do it with sort on a box with >15GB ram
05:03 🔗 yipdw JesseW: GNU sort with -T
05:03 🔗 yipdw that instructs sort to use temporary files and merge them
05:03 🔗 yipdw you don't need gobs of RAM at that point, just enough storage
05:04 🔗 JetBalsa has quit IRC (Read error: Connection reset by peer)
05:08 🔗 JesseW cool; I also heard about -S and --parallel to adjust that
05:16 🔗 Muad-Dib has joined #archiveteam
05:53 🔗 dcmorton has joined #archiveteam
05:53 🔗 dcmorton has quit IRC (Excess Flood)
05:53 🔗 dcmorton has joined #archiveteam
06:43 🔗 vitzli has joined #archiveteam
07:02 🔗 megaminxw has joined #archiveteam
07:19 🔗 megaminxw has quit IRC (Ping timeout: 633 seconds)
07:23 🔗 megaminxw has joined #archiveteam
07:39 🔗 vitzli has quit IRC (Leaving)
07:51 🔗 JesseW has quit IRC (Leaving.)
08:19 🔗 atomotic has joined #archiveteam
08:20 🔗 atomotic has quit IRC (Client Quit)
08:31 🔗 atomotic has joined #archiveteam
08:58 🔗 terburg has joined #archiveteam
09:22 🔗 terburg_ has joined #archiveteam
09:26 🔗 terburg has quit IRC (Ping timeout: 499 seconds)
09:26 🔗 terburg_ is now known as terburg
09:43 🔗 terburg has quit IRC (Ping timeout: 258 seconds)
10:21 🔗 SketchCow Working on this: http://fos.textfiles.com/MENU/
11:02 🔗 SilSte has quit IRC (Quit: No Ping reply in 180 seconds.)
11:02 🔗 SilSte has joined #archiveteam
11:02 🔗 Kazzy has quit IRC (Ping timeout: 260 seconds)
11:04 🔗 Kazzy has joined #archiveteam
11:11 🔗 VADemon has joined #archiveteam
12:12 🔗 arkiver3 has joined #archiveteam
12:22 🔗 WinterFox has quit IRC (Remote host closed the connection)
12:24 🔗 WinterFox has joined #archiveteam
12:33 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
13:06 🔗 arkiver3 has joined #archiveteam
13:07 🔗 megaminxw has quit IRC (Quit: Leaving.)
13:10 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
13:41 🔗 atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
13:59 🔗 nertzy2 has joined #archiveteam
13:59 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:19 🔗 nertzy2 has quit IRC (Quit: This computer has gone to sleep)
14:24 🔗 nertzy2 has joined #archiveteam
14:29 🔗 WinterFox has quit IRC (Remote host closed the connection)
14:35 🔗 nertzy2 has quit IRC (Quit: This computer has gone to sleep)
14:40 🔗 Morbus has quit IRC (http://www.disobey.com/)
14:57 🔗 VADemon has quit IRC (Quit: left4dead)
15:17 🔗 K4k has joined #archiveteam
15:32 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
15:54 🔗 K4k_ has joined #archiveteam
15:54 🔗 K4k_ has quit IRC (Remote host closed the connection!)
15:54 🔗 K4k has quit IRC (Read error: Connection reset by peer)
15:56 🔗 K4k_ has joined #archiveteam
15:56 🔗 K4k_ has quit IRC (Connection closed)
15:56 🔗 K4k_ has joined #archiveteam
16:11 🔗 VADemon has joined #archiveteam
16:13 🔗 Froggypwn has joined #archiveteam
16:26 🔗 Froggypwn has quit IRC (Ping timeout: 300 seconds)
16:29 🔗 Froggypwn has joined #archiveteam
16:44 🔗 K4k_ has quit IRC (Read error: Connection reset by peer)
16:45 🔗 K4k_ has joined #archiveteam
17:01 🔗 JesseW has joined #archiveteam
17:17 🔗 HCross Anyone got any more news sites to add to our Newsgrabber?
17:17 🔗 lbft has quit IRC (Read error: Operation timed out)
17:17 🔗 lbft has joined #archiveteam
17:24 🔗 JesseW has quit IRC (Leaving.)
17:25 🔗 lbft has quit IRC (Read error: Operation timed out)
17:26 🔗 schbirid2 where is the current list?
17:26 🔗 HCross 18<schbirid218> where is the current list?
17:27 🔗 HCross ooops
17:27 🔗 HCross http://newsgrabber.harrycross.me/services.html
17:27 🔗 lbft has joined #archiveteam
17:33 🔗 nertzy2 has joined #archiveteam
17:48 🔗 schbirid2 HCross: you just need feed urls?
17:51 🔗 HCross either RSS or the "main news" page
17:52 🔗 VADemon_ has joined #archiveteam
17:53 🔗 VADemon_ has quit IRC (Read error: Connection reset by peer)
17:54 🔗 VADemon_ has joined #archiveteam
17:55 🔗 VADemon has quit IRC (Read error: Operation timed out)
17:59 🔗 VADemon_ has quit IRC (Read error: Connection reset by peer)
18:00 🔗 VADemon has joined #archiveteam
18:00 🔗 arkiver If the site has multiple RSS feeds with different articles or has some other pages with articles only linked to from those pages, then please add the multiple RSS feads and the different non-RSS pages
18:02 🔗 Atluxity nrk.no does that... its a mess
18:04 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
18:04 🔗 VADemon has joined #archiveteam
18:07 🔗 schbirid2 HCross: https://pastee.org/g3vrd
18:07 🔗 schbirid2 i did not do all the feeds for at least focus.de and standard.at because they are messy and spammy (focus)
18:07 🔗 HCross ah. arkiver could we script these?
18:08 🔗 schbirid2 any chance that this might mutate into a distributed archive of news? without images and videos it should be reasonably small
18:09 🔗 HCross part of the aim is to get images and videos
18:12 🔗 nertzy2 has quit IRC (Quit: This computer has gone to sleep)
18:18 🔗 golu has joined #archiveteam
18:23 🔗 golu has quit IRC (Ping timeout: 241 seconds)
18:25 🔗 nertzy2 has joined #archiveteam
18:58 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
19:11 🔗 nertzy2 has quit IRC (Quit: This computer has gone to sleep)
19:12 🔗 K4k_ has quit IRC (Read error: Operation timed out)
19:14 🔗 JetBalsa has joined #archiveteam
19:28 🔗 Smiley so IBM have brought ustream...
19:41 🔗 SketchCow Yeah, weird.
19:41 🔗 SketchCow $150 too
19:41 🔗 SketchCow Bargain
19:47 🔗 K4k_ has joined #archiveteam
19:52 🔗 scyther has joined #archiveteam
19:53 🔗 K4k_ has quit IRC (Read error: Operation timed out)
19:54 🔗 q3g345 has joined #archiveteam
19:54 🔗 q3g345 has quit IRC (Client Quit)
20:27 🔗 K4k_ has joined #archiveteam
20:43 🔗 arkiver NewsGrabber is now covering 16 algerian newssites
20:43 🔗 arkiver Will be many more tomorrow
20:46 🔗 maseck_ has quit IRC (Remote host closed the connection)
21:03 🔗 SketchCow Great.
21:03 🔗 SketchCow Is this going into the archive yet?
21:03 🔗 arkiver yes
21:03 🔗 arkiver https://archive.org/search.php?query=newssites&sort=-publicdate&page=1
21:04 🔗 arkiver They are indexed by the wayback machine after a month or so, on page 12 at the bottom are some indexed items https://archive.org/search.php?query=newssites&sort=-publicdate&page=12
21:07 🔗 HCross Nice nice
21:07 🔗 schbirid2 has quit IRC (Quit: Leaving)
21:11 🔗 SketchCow Whose script where is doing this
21:11 🔗 SketchCow I have requests.
21:12 🔗 HCross its me and arkiver doing it all
21:12 🔗 arkiver Yes, I wrote the scripts and HCross is taking care of the server
21:12 🔗 HCross I make sure the server doesnt catch fire, arkiver writes the code
21:13 🔗 arkiver code is here https://github.com/ArchiveTeam/NewsGrabber
21:13 🔗 arkiver With a readme how to add new newssites
21:13 🔗 HCross http://newsgrabber.harrycross.me:29000 you can see what its doing here
21:13 🔗 arkiver All supported newssites are here https://github.com/ArchiveTeam/NewsGrabber/tree/master/services
21:20 🔗 oli has quit IRC (Read error: Operation timed out)
21:21 🔗 oli has joined #archiveteam
21:25 🔗 SketchCow HCross: You now have access to archiveteam_newssites. Please upload all future packs there.
21:25 🔗 HCross ok - thanks. arkiver is that something you need to script in?
21:26 🔗 SketchCow I gave the same access to the same account that has been doing the uploading.
21:26 🔗 HCross ah cool, thats my account
21:27 🔗 SketchCow I have put all 567 items you uploaded to opensource into its own collection.
21:27 🔗 SketchCow Next, please make it so the title of the item is set thus:
21:28 🔗 SketchCow Archive Team Newsgrab: (Number)
21:28 🔗 SketchCow So instead of the title being archiveteam_newssites_20160121_0021
21:28 🔗 SketchCow It should be Archive Team Newsgrab: 20160121_0021
21:28 🔗 SketchCow This is different than the itemname.
21:28 🔗 SketchCow Itemnames are fine.
21:29 🔗 HCross Ok. arkiver and I need to bash out a script update for that
21:29 🔗 SketchCow Additionally, they should really have a blurb.
21:30 🔗 SketchCow Frankly, they should be given some blurb text as well.
21:30 🔗 SketchCow This is not difficult stuff.
21:30 🔗 SketchCow I'm going to go fix the 567 now.
21:30 🔗 HCross Yea, easy to add to the script
21:37 🔗 VADemon has quit IRC (Read error: No route to host)
21:38 🔗 SketchCow Almost done renaming your 567 items
21:38 🔗 HCross thanks
21:39 🔗 arkiver As blurb just "Newsarticles grabbed by Archive Team."?
21:39 🔗 arkiver (in description)
21:40 🔗 SketchCow A collection of news articles grabbed from a wide variety of sources around the world automatically by Archive Team scripts.
21:40 🔗 VADemon has joined #archiveteam
21:41 🔗 SketchCow They're all getting the description right now (the 567). use that going forward.
21:41 🔗 SketchCow You'd be so proud of me for using "around the world" instead of "foreign"
21:45 🔗 SketchCow Speaking of news, I've now ISO'd a PILE of "Front Page News" CD-ROMs, which are text grabs of news stories going back to 1983
21:47 🔗 arkiver Nice
21:47 🔗 arkiver Awesome work you're doing on those CDs/DVDs
21:48 🔗 arkiver This item has everything ok right? https://archive.org/details/archiveteam_newssites_20160121_0022
21:51 🔗 WinterFox has joined #archiveteam
21:52 🔗 wickedpla is now known as wp494
22:06 🔗 godane SketchCow: you should know about this: https://archive.org/details/john-legere-twitter-periscope-video-2015-01-07
22:06 🔗 K4k_ has quit IRC (Read error: Operation timed out)
22:06 🔗 godane i got all of the John Legere videos from 2015-01-07 twitter/periscope
22:21 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
22:21 🔗 Ghost_of_ has joined #archiveteam
22:22 🔗 maseck has joined #archiveteam
22:25 🔗 xmc is now known as chronomex
22:25 🔗 chronomex is now known as xmc
22:30 🔗 scyther has quit IRC (Read error: Connection reset by peer)
22:34 🔗 SketchCow Great
22:35 🔗 SketchCow arkiver: Yes, that's what I was looking for.
22:35 🔗 megaminxw has joined #archiveteam
22:35 🔗 HCross Good. Ill add it now
22:37 🔗 megaminxw has quit IRC (Read error: Connection reset by peer)
22:38 🔗 megaminxw has joined #archiveteam
23:07 🔗 HCross Just updated and it doesnt seem the script is working properly, just waiting on an update
23:40 🔗 arkiver ^ problem fixed

irclogger-viewer