#archiveteam 2015-09-06,Sun

↑back Search

Time Nickname Message
00:09 🔗 schbirid has quit IRC (Quit: Leaving)
00:37 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
00:38 🔗 kyan has joined #archiveteam
00:57 🔗 xk_id has quit IRC (Remote host closed the connection)
01:18 🔗 xk_id has joined #archiveteam
01:18 🔗 vitzli has joined #archiveteam
01:20 🔗 Guest100 has joined #archiveteam
01:20 🔗 Guest100 has quit IRC (Client Quit)
01:33 🔗 Guest100 has joined #archiveteam
01:45 🔗 aaaaaaaaa sets mode: +o chfoo
01:49 🔗 SketchCow I'm about to go into the FTP dump on FOS.
01:49 🔗 SketchCow 632gb of god know what the fuck what
01:56 🔗 * xmc hands SketchCow scuba gear
01:57 🔗 Guest100 has quit IRC (My Mac has gone to sleep. ZZZzzz…)
02:00 🔗 xk_id_ has joined #archiveteam
02:00 🔗 xk_id has quit IRC (Read error: Connection reset by peer)
02:00 🔗 BlueMaxim has joined #archiveteam
02:06 🔗 SketchCow I already found a directory with 132gb of what-the-fuck
02:06 🔗 SketchCow mysql2014-08-31.tar.gz
02:06 🔗 SketchCow somewhere2014-09-01.tar.gz
02:06 🔗 SketchCow streaming_content2010-12-03.tar.gz
02:06 🔗 SketchCow tomcat2014-08-31.tar.gz
02:06 🔗 SketchCow turbulence2014-09-01.tar.gz
02:08 🔗 kniffy very descriptive XD
02:16 🔗 SketchCow Well, after I study it slightly more, it goes up as is.
02:17 🔗 vitzli has quit IRC (Quit: Leaving)
02:24 🔗 zgrep has left When-if-ever I become an archiver, I shall join. For now... meh.
02:29 🔗 vitzli has joined #archiveteam
02:44 🔗 primus104 has quit IRC (Leaving.)
02:52 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
03:04 🔗 xk_id_ has quit IRC (Remote host closed the connection)
03:09 🔗 vitzli has quit IRC (Quit: Leaving)
03:42 🔗 phuzion has quit IRC (Read error: Operation timed out)
03:49 🔗 phuzion has joined #archiveteam
04:20 🔗 aaaaaaaaa has quit IRC (Leaving)
04:40 🔗 vitzli has joined #archiveteam
04:45 🔗 SketchCow Found it. It's the "New American Radio" site, grabbed down to the tomcat and mysql instances.
04:56 🔗 Cameron_D has quit IRC (Ping timeout: 483 seconds)
05:05 🔗 Ravenloft has quit IRC (Ping timeout: 252 seconds)
05:10 🔗 Cameron_D has joined #archiveteam
05:28 🔗 xmc that's a hell of a grab
06:09 🔗 Guest100 has joined #archiveteam
06:10 🔗 anomie SketchCow: What is that?
06:12 🔗 db48x` has quit IRC (Remote host closed the connection)
06:29 🔗 jspiros has quit IRC (Ping timeout: 186 seconds)
06:44 🔗 PurpleSym has joined #archiveteam
06:57 🔗 jspiros has joined #archiveteam
07:40 🔗 vitzli has quit IRC (Quit: Leaving)
07:52 🔗 vitzli has joined #archiveteam
07:52 🔗 Guest100 has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
08:34 🔗 Stiletto has quit IRC ()
08:35 🔗 Stiletto has joined #archiveteam
08:40 🔗 schbirid has joined #archiveteam
08:49 🔗 arkiver So anyone has a good name for a DocStoc channel?
08:53 🔗 DFJustin docstocandbarrel
09:15 🔗 Ungstein has joined #archiveteam
09:18 🔗 arkiver2 has joined #archiveteam
09:25 🔗 arkiver2 has quit IRC (Ping timeout: 252 seconds)
09:33 🔗 primus104 has joined #archiveteam
09:48 🔗 fenn docoutofstoc
10:56 🔗 dxrt ^
11:01 🔗 arkiver2 has joined #archiveteam
11:24 🔗 arkiver2 has quit IRC (Ping timeout: 252 seconds)
11:26 🔗 arkiver2 has joined #archiveteam
11:26 🔗 primus104 has left
11:33 🔗 arkiver2 has quit IRC (Ping timeout: 252 seconds)
11:57 🔗 nmnn has joined #archiveteam
12:04 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
12:11 🔗 RichardG has joined #archiveteam
12:54 🔗 robink has quit IRC (Ping timeout: 492 seconds)
13:48 🔗 SilSte Is there an easy way to archive a discussion between to twitteraccounts?
14:00 🔗 arkiver2 has joined #archiveteam
14:04 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:11 🔗 arkiver2 has quit IRC (Ping timeout: 252 seconds)
14:28 🔗 Baljem SilSte: there isn't even an easy way to /follow/ a discussion between two Twitter accounts. bah.
14:28 🔗 khaoohs_ has joined #archiveteam
14:28 🔗 SilSte you see my problem ;)
14:28 🔗 SilSte I want something like "All the tweets between x and y from time a to z"
14:31 🔗 khaoohs has quit IRC (Read error: Operation timed out)
14:46 🔗 primus104 has joined #archiveteam
15:06 🔗 vitzli has quit IRC (Quit: Leaving)
15:07 🔗 dashcloud storify might be the best way, but if you need something that will work independently of twitter (in the case of people deleting tweets or accounts), you'll need to archive the conversation yourself- archivebot can do that
15:37 🔗 nmnn has quit IRC (Remote host closed the connection)
15:39 🔗 zenguy_pc has quit IRC (Excess Flood)
15:43 🔗 xk_id has joined #archiveteam
15:45 🔗 SmileyG has quit IRC (Remote host closed the connection)
15:45 🔗 Smiley has joined #archiveteam
15:46 🔗 zenguy_pc has joined #archiveteam
15:48 🔗 Laverne has quit IRC (Ping timeout: 369 seconds)
15:49 🔗 dxrt has quit IRC (Ping timeout: 369 seconds)
15:49 🔗 dxrt has joined #archiveteam
15:49 🔗 Laverne has joined #archiveteam
15:52 🔗 zenguy_pc has quit IRC (Excess Flood)
15:52 🔗 zenguy_pc has joined #archiveteam
16:38 🔗 atomotic has joined #archiveteam
17:10 🔗 robink has joined #archiveteam
17:32 🔗 arkiver2 has joined #archiveteam
17:39 🔗 arkiver2 Posted this in #archivebot by mistake:
17:39 🔗 arkiver2 <arkiver2>SketchCow: So Google Code is taking a little longer to start
17:39 🔗 arkiver2 <arkiver2>It needs a bit more tweaking in what will be downloaded and what not
17:39 🔗 arkiver2 <arkiver2>For example for every commit made we will download the page which shows the file changes made in the commit
17:39 🔗 arkiver2 <arkiver2>However, we will not download the files which have not been changed with the commit
17:39 🔗 arkiver2 <arkiver2>The git, hg and svn repo's will be downloaded through a special project, just like SourceForge. Those files also contain all commits.
17:43 🔗 arkiver2 has quit IRC (Ping timeout: 252 seconds)
17:47 🔗 arkiver2 has joined #archiveteam
17:50 🔗 arkiver2 has quit IRC (Client Quit)
17:52 🔗 aaaaaaaaa has joined #archiveteam
17:52 🔗 swebb sets mode: +o aaaaaaaaa
17:53 🔗 scyther has joined #archiveteam
18:10 🔗 primus104 has quit IRC (Leaving.)
18:11 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
18:12 🔗 aaaaaaaaa has joined #archiveteam
18:12 🔗 swebb sets mode: +o aaaaaaaaa
18:15 🔗 anomie arkiver: This is nice and all, but… don't you kinda feel like google should be doing this?
18:16 🔗 xmc anomie: that is the central thesis of archiveteam
18:24 🔗 SimpBrain has joined #archiveteam
18:24 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
18:53 🔗 primus104 has joined #archiveteam
18:53 🔗 scyther has quit IRC (Read error: Connection reset by peer)
19:01 🔗 khaoohs has joined #archiveteam
19:06 🔗 Start arkiver: #docstop
19:07 🔗 khaoohs_ has quit IRC (Ping timeout: 483 seconds)
19:07 🔗 khaoohs_ has joined #archiveteam
19:08 🔗 khaoohs has quit IRC (Read error: Operation timed out)
19:27 🔗 scyther has joined #archiveteam
19:53 🔗 habi has joined #archiveteam
19:54 🔗 habi has left
20:09 🔗 schbirid has quit IRC (Quit: Leaving)
20:17 🔗 arkiver Shall we do #docstop?
20:29 🔗 HCross Any eta on GoogleCode or not?
20:44 🔗 arkiver HCross: ok if I PM you when we start?
20:44 🔗 HCross Yeah, I am out tomorrow intil around 2pm British - I dont have any big boxes this time.
20:45 🔗 HCross Intening to test what https://www.scaleway.com is like for the ArchiveTeam stuff
20:45 🔗 godane has left
20:45 🔗 godane has joined #archiveteam
20:49 🔗 PurpleSym has quit IRC (Remote host closed the connection)
20:50 🔗 bsmith096 has joined #archiveteam
20:50 🔗 bsmith096 i need a command to remove all leading . characters from folder and file names, recursively,
20:51 🔗 Guest100 has joined #archiveteam
20:51 🔗 bsmith096 the way i did the fanfic grab, some folders have a dot char as the first character, so they are hidden, and some even have 2 or three dots.
20:57 🔗 c_b has joined #archiveteam
20:57 🔗 bsmith096 has quit IRC (Ping timeout: 240 seconds)
21:05 🔗 Rotab HCross: thats.. really cheap
21:05 🔗 HCross They arent bad either
21:06 🔗 bsmith096 has joined #archiveteam
21:06 🔗 HCross I can see them being quite good little ArchiveTeam servers
21:06 🔗 Rotab ahhh right, it's online.net's ssd vps thing
21:06 🔗 Rotab yeah
21:07 🔗 HCross Only thing ive noticed is support is a tad slower than Onlines main support
21:07 🔗 Rotab and theyre french? :P
21:08 🔗 HCross I remember the time that Online.net's main support English left a lot to be desired
21:09 🔗 HCross Ive also worked out how to make URLTeam go on them, you need to remove the address thing from the command and off it goes
21:11 🔗 Rotab cool
21:12 🔗 Rotab works fine even though its arm?
21:12 🔗 HCross Yeah, watch HCross on the tracker and see
21:14 🔗 HCross Does seem to be 404'ing a lot - will dial it down and see
21:15 🔗 bsmith096 i'm the fsanfic grab uy on reddit, heres the magnet link to the gzip file magnet:?xt=urn:btih:3E2HBHI4P4N7E3MCM4MIATPF66STOV64&amp;amp;dn=Fanfiction.tar.gz&amp;amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80
21:16 🔗 bsmith096 https://www.reddit.com/r/DataHoarder/comments/3jl3qm/nearly_complete_archive_of_fanfictionnet/
21:17 🔗 PotcFdk bsmith096: Is it on archive.org? Should it be on archive.org?
21:22 🔗 PotcFdk nvm https://www.reddit.com/r/DataHoarder/comments/3jl3qm/nearly_complete_archive_of_fanfictionnet/cuqg5jw
21:25 🔗 Rotab yay. i found a peer
21:25 🔗 HCross How big is it?
21:26 🔗 Rotab 107.64 GB
21:27 🔗 Rotab bsmith096: your magnet link is broken.. :P
21:30 🔗 scyther has quit IRC (Read error: Connection reset by peer)
21:32 🔗 PotcFdk So, about 108 GB of *compressed* fanfictions, can humanity handle this?
21:32 🔗 HCross Going to URLTeam under 2 usernames to compare. HCross is a VM on my x86 OVH server and HCrossScaleway is on my ARM server
21:42 🔗 anomie bsmith096: What is that?
21:43 🔗 xk_id has quit IRC (Remote host closed the connection)
22:00 🔗 zenguy_pc has quit IRC (Read error: Connection reset by peer)
22:01 🔗 zenguy_pc has joined #archiveteam
22:03 🔗 xk_id has joined #archiveteam
22:09 🔗 SimpBrain was thinking about getting that scaleway server
22:15 🔗 bsmith097 has joined #archiveteam
22:16 🔗 Coderjoe_ has joined #archiveteam
22:18 🔗 qwebirc56 has joined #archiveteam
22:19 🔗 qwebirc56 Rotab: how do i fix it?
22:19 🔗 Rotab qwebirc56: magnet:?xt=urn:btih:3E2HBHI4P4N7E3MCM4MIATPF66STOV64&dn=Fanfiction.tar.gz&tr=udp://tracker.openbittorrent.com:80
22:19 🔗 bsmith097 has quit IRC (Ping timeout: 240 seconds)
22:19 🔗 qwebirc56 Rotab: SketchCow DFJustin swebb anyone have a thing to strip the dot characters from the front of folder and filenames?
22:26 🔗 Coderjoe has quit IRC (Ping timeout: 624 seconds)
22:29 🔗 SketchCow bash
22:29 🔗 SketchCow Wow, anomie said something adorable.
22:31 🔗 Ravenloft has joined #archiveteam
22:32 🔗 qwebirc56 SketchCow: ok, but the syntax for all the commands iv'e found is very starnge, i just need to unhide some hidden folders and files by removing the dots from the front of their names
22:32 🔗 godane SketchCow: i'm up to 2015-05-10 of medium.com urls
22:33 🔗 anomie godane: Are we archving that whole site?
22:33 🔗 godane yes
22:33 🔗 godane based on sitemap
22:33 🔗 anomie Nice.
22:33 🔗 godane they delete alot of articles so needs to be done
22:34 🔗 anomie Probably a good idea. There's a lot of good stuff in there.
22:34 🔗 godane that way i can just download the full daily sitemap every few day or so
22:35 🔗 godane once its up today
22:35 🔗 qwebirc56 SketchCow: all iv'e found is things to reanme files not folders, and just changing the typr to -d doesn't seem to help
22:36 🔗 godane anomie: 188 410 errors just in 2015-05-10 dump
22:36 🔗 anomie godane: Are we (or you alone?) going to create a cron job and update it continuisly?
22:36 🔗 godane i will update it continuisly
22:37 🔗 anomie Nice.
22:38 🔗 godane so 2015-05-08 has 409 urls with 410 errors
22:38 🔗 godane and 2015-05-09 as 262 with 410 errors
22:39 🔗 aaaaaaaaa anomie: you're new. godane is one of our best vacuums.
22:42 🔗 anomie Nice.
22:42 🔗 xk_id has quit IRC (Remote host closed the connection)
22:43 🔗 Start has quit IRC (Ping timeout: 306 seconds)
22:43 🔗 xk_id has joined #archiveteam
22:46 🔗 Start has joined #archiveteam
22:51 🔗 xk_id has quit IRC (Remote host closed the connection)
23:01 🔗 PotcFdk Does anyone know of any forum scrapers? I'm planning to extract texts out of a private vBulletin powered forum and don't need any markup or resources.
23:02 🔗 Guest100 has quit IRC (My Mac has gone to sleep. ZZZzzz…)
23:04 🔗 c_b2 has joined #archiveteam
23:04 🔗 c_b2 has quit IRC (Client Quit)
23:05 🔗 c_b has quit IRC (Ping timeout: 252 seconds)
23:05 🔗 wyatt8740 has quit IRC (Remote host closed the connection)
23:06 🔗 joepie91 PotcFdk: best to grab in WARC first, and then extract from that
23:06 🔗 joepie91 no idea if any such tools exist though
23:07 🔗 PotcFdk joepie91: I did some searching, but haven't been able to find anything. I guess I might have to fiddle something together by RegExing or parsing the HTML
23:10 🔗 joepie91 PotcFdk: don't use regex :)
23:10 🔗 joepie91 regex for html is bad
23:10 🔗 joepie91 PotcFdk: what languages do you speak?
23:11 🔗 PotcFdk Some scripting langs (Lua, bash), C(++), Java, a bit of Go
23:11 🔗 joepie91 hmmm.
23:11 🔗 joepie91 PotcFdk: the only one of those that I'd expect to have a reasonable HTML parser, would be Go
23:11 🔗 joepie91 like, one where you aren't busy writing boilerplate for the next 2 months
23:11 🔗 joepie91 to extract a username
23:11 🔗 joepie91 lol
23:12 🔗 PotcFdk haha
23:12 🔗 joepie91 PotcFdk: my first recommendation would generally be Cheerio (JS), and second recommendation lxml/BeautifulSoup (Python), but neither of those were in your list :P
23:13 🔗 PotcFdk I can move myself forward in JS, I just might need more time, but I guess that's okay
23:13 🔗 joepie91 PotcFdk: then Cheerio might be a good choice. it's basically jQuery without a browser
23:13 🔗 joepie91 you'd generally run it in Node.js, but technically you could run it in pretty much any JS runtime
23:13 🔗 PotcFdk Sounds interesting, that might help me
23:14 🔗 joepie91 PotcFdk: https://github.com/cheeriojs/cheerio
23:14 🔗 joepie91 PotcFdk: combine with http://cryto.net/~joepie91/blog/2015/05/04/functional-programming-in-javascript-map-filter-reduce/
23:14 🔗 joepie91 if you want nice code
23:14 🔗 joepie91 and https://docs.npmjs.com/ + https://nodejs.org/api/modules.html if you haven't used Node before
23:15 🔗 joepie91 et voila
23:15 🔗 joepie91 and I just realized that this is #archiveteam
23:15 🔗 joepie91 so we should probably move this to #archiveteam-bs
23:15 🔗 joepie91 :P
23:15 🔗 PotcFdk Truth
23:23 🔗 xk_id has joined #archiveteam
23:29 🔗 jspiros is the current best method for imaging old Mac GCR floppies still just using an old Mac to read/image them?
23:37 🔗 aaaaaaaa_ has joined #archiveteam
23:37 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
23:37 🔗 swebb sets mode: +o aaaaaaaa_
23:37 🔗 aaaaaaaa_ is now known as aaaaaaaaa

irclogger-viewer