#archiveteam-bs 2016-03-17,Thu

↑back Search

Time Nickname Message
00:04 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
00:15 🔗 Kenshin has quit IRC (Ping timeout: 260 seconds)
00:18 🔗 Start has joined #archiveteam-bs
00:20 🔗 Mayonaise has joined #archiveteam-bs
00:24 🔗 espes___ has quit IRC (Ping timeout: 250 seconds)
00:25 🔗 Kenshin has joined #archiveteam-bs
00:31 🔗 dxrt We are writing to let you know that with effect from 27 January 2016, the Slashdot Media business, which provides online services through various web sites including Slashdot.org and SourceForge.net (the "Slashdot Media Services") has been purchased by SourceForge Media LLC of 1660 Logan Avenue, San Diego, California, 92113, USA ("we" or "us").
00:48 🔗 vtyl has quit IRC (Read error: Operation timed out)
00:48 🔗 vtyl has joined #archiveteam-bs
00:49 🔗 espes__ has joined #archiveteam-bs
00:50 🔗 Frogging dxrt: yep, that happened
00:54 🔗 JesseW has joined #archiveteam-bs
00:57 🔗 Sk1d is there a collection on archive.org for youtube videos I downloaded and reuploaded ther via the tubeup.py script?
01:29 🔗 ndiddy has joined #archiveteam-bs
01:29 🔗 xXx_ndidd has quit IRC (Read error: Connection reset by peer)
01:38 🔗 Sk1d SketchCow: ^ I uploaded the alphaGo videos from https://www.youtube.com/channel/UCP7jMXSY2xbc3KCAE0MHQ-A/videos
01:53 🔗 JesseW MrRadar: so I've got the fanfiction archive uncompressed -- you recommended the following arguments to p7zip -mx=9 -ms=128m -m0=LZMA2
01:53 🔗 MrRadar Before you do that, ZIP might be better if the IA can browse inside of them
01:53 🔗 JesseW but also said it might be worth trying PPMd compression; I need to figure out what the proper arguments are for that.
01:53 🔗 JesseW Ah, true -- that's a good reason to for zip
01:54 🔗 ko_ has joined #archiveteam-bs
01:54 🔗 MrRadar I didn't think of that when I originally was for 7z
01:54 🔗 ErkDog well I was trying to help that guy with the fan fiction file
01:54 🔗 ErkDog but the warc extract kept crashing @ 13,000,000 items
01:54 🔗 JesseW I'm not sure I'd trust IA's zip viewer with a zip file containing millions of items, though
01:55 🔗 JesseW ErkDog: I tried to find the file he was referring to, but couldn't find it in the WARCs.
01:56 🔗 * JesseW will be AFK
01:56 🔗 MrRadar The AT wiki actually specifically says to prefer .zip for this reason: http://archiveteam.org/index.php?title=Internet_Archive#Uploading_to_archive.org
01:57 🔗 MrRadar When uploading to IA
01:57 🔗 JesseW Actually, I think probably the right idea is a number of zip files, each containing no more than, say, 1000 files...
01:57 🔗 MrRadar Yeah
01:58 🔗 JesseW maybe organized by folder, or first few letters of folder
01:58 🔗 JesseW but I also want to wait on advice from SketchCow about recommended ways to handle this, too.
01:58 🔗 MrRadar That would be good
01:59 🔗 MrRadar Though he said earlier that he would be travelling today and tomorrow
01:59 🔗 JesseW yep, so I expect to be waiting till probably sometime in the weekend. That works fine for me.
02:01 🔗 JesseW up till then, I can still do various analysis on the files, figuring out how big various directories are, etc.
02:03 🔗 ko_ has quit IRC (Quit: Page closed)
02:04 🔗 JesseW for now, I'm re-generating the inventory file (all ~ 800MB of it)
02:07 🔗 JesseW has quit IRC (Quit: Leaving.)
03:02 🔗 JesseW has joined #archiveteam-bs
03:05 🔗 JesseW The inventory of the fanfictionNet grab found 6,930,546 files.
03:05 🔗 JesseW and took half an hour to generate
03:13 🔗 JesseW Interestingly, there are only 7 top-level folders (i.e. fandoms) with over 100,000 files: http://0bin.net/paste/fQ3AlxKaYJkp7b36#d4iFB00G0pA7HODIlLHWbp9cby1KCw-7jEQvbTc0ZI2
03:26 🔗 JesseW and this is the distribution of titles of completed Harry Potter stories by initial letter: http://0bin.net/paste/jt2SMbh92gtdr+dQ#n7fLpZfPbWNynm7kAPz9dfVoi3RZP23LeyDOg1T0erG
03:50 🔗 JesseW And those top 7 folders -- contain a total of about 88 GB (out of ~307G for the whole thing)
04:02 🔗 robink has quit IRC (Ping timeout: 633 seconds)
04:04 🔗 tomwsmf-a has joined #archiveteam-bs
04:23 🔗 Asparagir JesseW: I would have guessed Sherlock, LOTR, and the Marvel Movies (Avengers, etc.) would be in the top for sure. But maybe that's a reflection on AO3 biases rather than FanFictionNet.
04:23 🔗 JesseW yeah, different era, I think.
04:24 🔗 Asparagir Also, boy bands.
04:24 🔗 Asparagir One Direction, K-Pop. And Teen Wolf.
04:25 🔗 Asparagir Someday, if AO3 ever opens up its database as an API someday, I would love to figure out how to write a recommendation engine for it. Fans who like THIS also like THAT.
04:26 🔗 robink has joined #archiveteam-bs
04:26 🔗 bwn_ has quit IRC (Read error: Operation timed out)
04:27 🔗 JesseW Well, with the new Board for A03, maybe things will move more on that front.
04:28 🔗 JesseW I've been keeping an eye on the meeting minutes, and making sure they're in Wayback
04:35 🔗 JesseW bsmith094: In the Fanfiction grab, there's a path: Fanfiction/Fanfiction, with 8904 subdirectories -- many of which are empty. Any idea what was up with that?
04:36 🔗 yipdw_ has quit IRC (Ping timeout: 260 seconds)
04:43 🔗 JesseW Of the 169,569 directories in the grab, 8,250 of them are empty.
04:45 🔗 JesseW and all but 36 of those are under Fanfiction/Fanfiction
04:52 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
05:20 🔗 yipdw has joined #archiveteam-bs
05:28 🔗 decay has quit IRC (Read error: Operation timed out)
05:32 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
05:35 🔗 decay has joined #archiveteam-bs
05:45 🔗 bwn_ has joined #archiveteam-bs
05:55 🔗 JesseW It looks like the Fanfiction/Fanfiction one is a partial copy of the rest.
05:55 🔗 JesseW totaling about 3 G
05:56 🔗 phuzion has quit IRC (Read error: Operation timed out)
06:00 🔗 Lord_Nigh has joined #archiveteam-bs
06:00 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
06:07 🔗 Sk1d has joined #archiveteam-bs
06:08 🔗 JesseW Well, it's not an *exact* copy, there's at least one story that got re-written between the times it was grabbed.
06:14 🔗 bwn_ has quit IRC (Read error: Operation timed out)
06:18 🔗 yipdw strangely, today I was looking into hosting an instance of otwarchive
06:19 🔗 JesseW otwarchive?
06:19 🔗 yipdw the software behind AO3
06:19 🔗 JesseW ah, I didn't know the name of the software
06:19 🔗 JesseW how painful is it to set up an instance?
06:20 🔗 yipdw I got roped into a Secret Project and decided "well let's see what OTW did"
06:20 🔗 yipdw well, it's a Rails 3.2.22 app and I'm looking into Docker
06:20 🔗 yipdw on a scale from Go to de Sade, I'd say Christian Grey
06:20 🔗 JesseW whew...
06:21 🔗 JesseW are we talking about Go the game, or go as in "start"
06:21 🔗 yipdw Go the language which tends to produce stuff that's easy to work with operationally
06:21 🔗 JesseW ah
06:21 🔗 JesseW Interesting -- I hadn't actually heard that about Go-lang
06:21 🔗 yipdw outputs end up being statically linked
06:22 🔗 yipdw this is nice in some ways and terrible in others
06:22 🔗 yipdw people who develop in golang tend to have other stories but I don't so I don't have any stories of my own
06:23 🔗 * JesseW googled for go-lang dependency hell, and it's bringing up some interesting bits
06:24 🔗 yipdw honestly the hardest part of getting otwarchive up and running (for me) is that it's written using mysql and Ruby 2.0.0, neither of which I have installed
06:24 🔗 yipdw once you get past that I expect it is just like any other Rails 3 app
06:25 🔗 JesseW IDK much about ruby, or it's operational challenges
06:26 🔗 yipdw my biggest pain has come from Ruby and attendant libraries moving faster than most distros will keep up with
06:26 🔗 yipdw the libraries bit can be solved (mostly) with bundler and app-local bundles
06:27 🔗 yipdw the Ruby bit though is a bit more annoying; sometimes I work around it with rvm/rbenv/chruby but that adds another thing to get into the environment
06:28 🔗 yipdw at present I work around this by using Docker images that are preconfigured with all the right stuff in the env, but it's not like I've simplified the stack by doing this
06:28 🔗 phuzion has joined #archiveteam-bs
06:28 🔗 yipdw Piss Your Devops Staff Off With These Five Weird Tricks
06:29 🔗 JesseW lol
06:40 🔗 JesseW https://nathany.com/go-packages/
06:44 🔗 JesseW So, of the over 56,000 files in Fanfiction/Fanfiction, all but 19 of them are identical to copies in just Fanfiction/
06:44 🔗 JesseW I am not at all sure what is the right way to handle this. :-/
06:58 🔗 metalcamp has joined #archiveteam-bs
07:08 🔗 godane has quit IRC (Ping timeout: 260 seconds)
07:18 🔗 Asparagir has quit IRC (Read error: Connection reset by peer)
07:19 🔗 achip has quit IRC (Ping timeout: 258 seconds)
07:20 🔗 RedType has quit IRC (Read error: Operation timed out)
07:22 🔗 RedType has joined #archiveteam-bs
07:28 🔗 achip has joined #archiveteam-bs
07:44 🔗 JesseW has quit IRC (Quit: Leaving.)
07:52 🔗 bwn has joined #archiveteam-bs
08:08 🔗 K0 has joined #archiveteam-bs
08:32 🔗 K0 has quit IRC (Quit: Page closed)
08:48 🔗 pgoetz has quit IRC (Quit: No Ping reply in 180 seconds.)
08:48 🔗 pgoetz has joined #archiveteam-bs
09:14 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
09:35 🔗 vtyl has quit IRC (Read error: Operation timed out)
09:39 🔗 lytv has joined #archiveteam-bs
10:41 🔗 Spilverga has joined #archiveteam-bs
11:05 🔗 schbirid has joined #archiveteam-bs
12:23 🔗 vitzli has joined #archiveteam-bs
13:04 🔗 TheKiwi has quit IRC (Ping timeout: 260 seconds)
13:26 🔗 pgoetz has quit IRC (Quit: No Ping reply in 180 seconds.)
13:27 🔗 VADemon has joined #archiveteam-bs
14:07 🔗 metalcamp has joined #archiveteam-bs
14:12 🔗 Start has quit IRC (Quit: Disconnected.)
14:43 🔗 Start has joined #archiveteam-bs
15:15 🔗 toad2 has joined #archiveteam-bs
15:16 🔗 toad1 has quit IRC (Read error: Operation timed out)
15:28 🔗 ohhdemgir has quit IRC (Quit: True)
15:48 🔗 JesseW has joined #archiveteam-bs
15:57 🔗 vitzli has quit IRC (Leaving)
16:02 🔗 ohhdemgir has joined #archiveteam-bs
16:05 🔗 pgoetz has joined #archiveteam-bs
16:06 🔗 Start has quit IRC (Quit: Disconnected.)
16:15 🔗 JesseW has quit IRC (Quit: Leaving.)
17:34 🔗 bwn has quit IRC (Ping timeout: 246 seconds)
17:39 🔗 JW_work has quit IRC (Read error: Operation timed out)
18:02 🔗 bwn has joined #archiveteam-bs
18:07 🔗 godane has joined #archiveteam-bs
18:19 🔗 Start has joined #archiveteam-bs
18:45 🔗 joepie91 http://www.rtlnieuws.nl/editienl/hema-klanten-krijgen-6-maanden-om-fotoalbums-af-te-maken
19:04 🔗 toad2 has quit IRC (Read error: Operation timed out)
19:07 🔗 toad1 has joined #archiveteam-bs
19:15 🔗 Smiley has joined #archiveteam-bs
19:31 🔗 Start has quit IRC (Quit: Disconnected.)
19:50 🔗 xXx_ndidd has joined #archiveteam-bs
19:50 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
21:08 🔗 bwn has quit IRC (Leaving)
21:26 🔗 schbirid has quit IRC (Quit: Leaving)
21:29 🔗 pgoetz_ has joined #archiveteam-bs
21:29 🔗 Baljem has joined #archiveteam-bs
21:33 🔗 pgoetz has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 RedType has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 Baljem_ has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 midas has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 BnA-Rob1n has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 Fletcher_ has quit IRC (hub.se efnet.portlane.se)
21:33 🔗 bsmith094 has quit IRC (hub.se efnet.portlane.se)
21:35 🔗 bsmith093 has joined #archiveteam-bs
21:36 🔗 RedType_ has joined #archiveteam-bs
21:40 🔗 Stiletto is now known as Stilett0
21:45 🔗 midas1 has joined #archiveteam-bs
21:46 🔗 RichardG has quit IRC (Ping timeout: 244 seconds)
22:19 🔗 Fletcher_ has joined #archiveteam-bs
22:37 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
22:47 🔗 bsmith093 JeseW
22:48 🔗 bsmith093 The Fanfiction/Fanfiction thing was me being stupid and not clearing out the old grab before starting a new one, after finally settling on a file name format
22:52 🔗 Spilverga has quit IRC (Ping timeout: 268 seconds)
23:01 🔗 ErkDog has quit IRC (Read error: Operation timed out)
23:01 🔗 ErkDog has joined #archiveteam-bs
23:01 🔗 dashcloud has quit IRC (Ping timeout: 260 seconds)
23:04 🔗 dashcloud has joined #archiveteam-bs
23:22 🔗 Start has joined #archiveteam-bs
23:40 🔗 Rickster has quit IRC (Ping timeout: 260 seconds)
23:52 🔗 Stiletto has joined #archiveteam-bs
23:55 🔗 Rickster has joined #archiveteam-bs

irclogger-viewer