#archiveteam 2014-10-20,Mon

↑back Search

Time Nickname Message
00:11 🔗 aaaaaaaaa bebzol: you could download the tracker dev env and set up a network on virtualbox for it and just change the tracker_host in your pipeline.py file
00:14 🔗 aaaaaaaaa https://github.com/ArchiveTeam/archiveteam-dev-env or follow the directions here: http://archiveteam.org/index.php?title=Dev/Tracker
00:20 🔗 bebzol yay, thanks :)
02:41 🔗 dashcloud any idea why wpull is telling me "ImportError: No module named 'sqlalchemy" ? I used the pip install wpull method
03:00 🔗 garyrh try pip install -U wpull
05:11 🔗 amerrykan i'm trying to pull down every video from LA Podfest 2014
05:11 🔗 amerrykan i've got all but three videos, youtube-dl fails with a weird error
05:13 🔗 amerrykan Go Bayside! - player.vimeo.com/video/107790103
05:13 🔗 amerrykan Road Stories - player.vimeo.com/video/107786707
05:13 🔗 amerrykan The JV Club - player.vimeo.com/video/107793309
05:13 🔗 amerrykan i'm about to head to bed, but if anyone can suggest, i'd appreciate it
05:32 🔗 danneh_ amerrykan: I'm giving them a shot, will let you know how they go
05:36 🔗 amerrykan if they even start for you, that's further than I've got
05:38 🔗 danneh_ yep, first one finished and second one started, maybe try: pip install --upgrade youtube-dl
05:39 🔗 danneh_ or it could've blocked you for doing too many at one time? though I've downloaded about 30 at once and not had issues
05:40 🔗 amerrykan i'm on arch, so freshness shouldn't be my problem
05:41 🔗 amerrykan i'm getting 'unable to extract info' type errors
05:44 🔗 danneh_ fair enough, that's weird
05:46 🔗 danneh_ I'd probably still try either that pip or youtube-dl -U , since that issue is generally related to out-of-date extractor info and shouldn't hurt anything
05:47 🔗 danneh_ in any case, I can upload these if all else fails
08:05 🔗 danneh_ amerrykan: downloaded both of those, let me know if you want me to upload them somewhere
08:05 🔗 danneh_ all three of those*
08:32 🔗 sharpobje hi, any way to find a particular twitch vod on the internet archive?
08:32 🔗 sharpobje I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389
08:42 🔗 Atluxity sharpobje: lurk around, someone will be with you eventually
10:53 🔗 bebzol hello! I'm developing seesaw and lua scripts for archiving ownlog.com service - could you create an ownlog-grab github repository for it?
12:06 🔗 Muad-Dib arkiver, ivan` ^
15:18 🔗 stevenola SketchCow: Looking for your opinion on something
15:19 🔗 stevenola I run artpacks.org, and I've had someone contact me asking for full packs that they've participated in to be removed from the archive
15:19 🔗 stevenola Because their art is being indexed by google, and contains their real name, mailing address, phone number, exgirlfriend names, etc
15:20 🔗 stevenola I've already added their art to robots.txt as a quick fix for this issue
15:20 🔗 stevenola And I have no intention to remove full packs
15:21 🔗 stevenola But I'm curious what you think about this situation
15:22 🔗 DFJustin I'd suggest adding a robots.txt rule to whitelist ia_archiver
15:22 🔗 balrog stevenola: how were these packs produced?
15:22 🔗 DFJustin because what you really care about is google etc
15:22 🔗 stevenola I've written (but not yet sent) an email describing what I did with robots.txt, and offered to censor their phone numbers and personal details. I think I'm willing to remove the specific art, but I'm still curious to hear your thoughts
15:22 🔗 balrog were they collected from other places?
15:22 🔗 balrog I'd censor the personal details, since that's what they're worried about
15:23 🔗 stevenola balrog: it's old artscene artwork. It was produced by the artists, published by an "artgroup" and distributed to many sources by the group via BBS, FTP and web.
15:23 🔗 raylee what'd i miss?
15:23 🔗 raylee damn bnc
15:25 🔗 aaaaaaaaa A part of me can't help but think that it is available elsewhere and they put it on the internet and they knew it was publicly posted
15:25 🔗 DFJustin raylee: http://badcheese.com/~steve/atlogs/?chan=archiveteam
15:25 🔗 balrog DFJustin: whitelist ia_archiver globally
15:25 🔗 balrog aaaaaaaaa: hah, yeah
15:26 🔗 stevenola aaaaaaaaa: yes, artpacks were basically distributed in a "hey, here's the file. pass it around!" way
15:26 🔗 stevenola i understand the artist's concern
15:27 🔗 stevenola just looking for other perspectives or thoughts i havent considered
15:28 🔗 aaaaaaaaa I'd whitelist archive.org though. No use in potentially deleting it forever and it won't show up unless you specifically look for it.
15:29 🔗 stevenola have i done it correclty? http://artpacks.org/robots.txt
15:30 🔗 stevenola SketchCow: Since you're familiar with the artscene, your thoughts are greatly appreciated (when you get a chance)
15:30 🔗 schbirid looks correct to me
15:31 🔗 schbirid if you want, someone from here could initiate a full crawl for archive.org
15:31 🔗 schbirid (while ignoring robots.txt)
15:31 🔗 aaaaaaaaa did you delete them too, I'm getting 404s
15:31 🔗 DFJustin stevenola: you might try emailing him instead since he seems to be afk
15:31 🔗 aaaaaaaaa on some
15:31 🔗 stevenola No worries about that. THe actual content is all over. I think most of the pre-2004 content is on archive.org already
15:33 🔗 stevenola aaaaaaaaa: ah, my script generated the urls to be blocked incofrrectly :0
15:33 🔗 stevenola :)
15:33 🔗 stevenola goddamn this new keyboard
15:33 🔗 stevenola DFJustin: di you have a contact email for him?
15:35 🔗 DFJustin jason@textfiles.com
15:37 🔗 stevenola thank you!thank you!
17:19 🔗 joepie91 stevenola: I don't think IA is indexed by Google
17:19 🔗 joepie91 so if the concern is name find-ability, that shouldn't be an issue
17:19 🔗 joepie91 err
17:19 🔗 joepie91 IA is indexed
17:19 🔗 joepie91 I meant I don't think the wayback is indexed by Google *
17:21 🔗 SketchCow Boop.
17:22 🔗 SketchCow I never respond to those.
17:58 🔗 stevenola Ah. Maybe that would have been a good strategy
17:58 🔗 stevenola :)
17:59 🔗 stevenola "strategy"
18:10 🔗 signius stevenola, Its the thing of the Internet is written in INK not Pencil
18:12 🔗 stevenola preaching to the choir
18:12 🔗 signius :D
18:44 🔗 namespace So why does the warrior lose all its data on shutdown?
18:45 🔗 aaaaaaaaa it reformats on startup
18:45 🔗 namespace Yes but why.
18:46 🔗 Jonimus to make sure it has a clean slate such that the next run doesn't run into issues with space or leftover data.
18:46 🔗 namespace Oh well, I have to shut down my computer sometimes and feel incredibly guilty losing you guys 1.2 gigs of data.
18:47 🔗 chronomex you can hit the "suspend" button in virtualbox
18:47 🔗 aaaaaaaaa you can pause the virtual machine, it can usually start right back up where it left off.
18:47 🔗 chronomex yeah
18:47 🔗 namespace chronomex: Oh so that is how you're supposed to do it?
18:47 🔗 namespace Okay.
18:47 🔗 Jonimus If you tell the warrior to shut down using the web interface it will shutdown once the data is sent.
18:47 🔗 chronomex ^
18:47 🔗 chronomex but that can take a little while, depending
18:47 🔗 namespace Well that'll take way too long. :P
18:47 🔗 Jonimus yeah
18:48 🔗 Jonimus that usually they do a few release claims towards the end of a project to make sure data lost in that manor is grabbed by someone else.
18:49 🔗 namespace I mean this seems like sort of a 'gotcha' to me and I feel like there's probably some better solution.
18:50 🔗 aaaaaaaaa Then just save the state when you close it and what to shut off the computer; unless i am missing your point.
18:50 🔗 Jonimus there is like a vbox setting to have it suspend rather than shutdown boxes when you shutdown
18:51 🔗 namespace aaaaaaaaa: My point is that this isn't intuitive for a first time user to know to do.
18:52 🔗 Jonimus which is why we have the release claims methodology.
18:52 🔗 namespace K.
18:52 🔗 yipdw also why it's good practice for items to not be too big
18:52 🔗 chronomex yeah
18:52 🔗 chronomex ten minutes of downloading is a nice number
18:53 🔗 namespace Well it doesn't help that my upload is like a soda straw compared to my download.
18:53 🔗 namespace I'm not sure if that's on my end or Archive.org's end.
18:53 🔗 Jonimus as is most home users.
18:54 🔗 Jonimus and many projects are uploaded to a staging server rather than directly to Archive.org
18:54 🔗 namespace I mean obviously the warrior is rate limiting, and it's very plausible that the staging server/etc has trouble with recieving the data as fast as it's being grabbed.
18:55 🔗 chronomex maybe we should investigate overlapping the upload and download phases
18:55 🔗 Jonimus that depends on the thing being grabbed, I know that was an issue for twitch as there were a large number of VPS's being used.
18:55 🔗 Jonimus They kinda already do overlapt.
18:55 🔗 chronomex hm, ok
18:55 🔗 chronomex i'm not very up on things
18:56 🔗 Jonimus It starts the next download as it uploads the previous task.
18:56 🔗 chronomex oh, yeah, i guess it does
18:56 🔗 chronomex my bad
18:58 🔗 bebzol hi! anyone here can create me a github repository?
18:59 🔗 bebzol i need ownlog-grab to start a rescue of a blogging platform
18:59 🔗 bebzol i'm almost done with scripts and lua
19:02 🔗 namespace bebzol: What blogging platform?
19:02 🔗 yipdw ownlog
19:03 🔗 * namespace still wants to hit ravearchive
19:03 🔗 sharpobje hi, any way to find a particular twitch vod on the internet archive?
19:03 🔗 sharpobje I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389
19:03 🔗 bebzol its ownlog.com - a platform for about 45 000 blogs in Poland
19:03 🔗 bebzol it's rotting away as its owners don't seem to care
19:03 🔗 namespace bebzol: How can I help?
19:05 🔗 bebzol I can prepare seesaw script and lua (amost done). I've created a list of all items to download - just don't know what to do next ;). I suppose I should put this on github repository and send someone an item list (about 45 000 items - each item is a particular subdomain)
19:06 🔗 garyrh you can also create your own repo then transfer ownership over to archiveteam
19:08 🔗 bebzol this may be an idea
19:09 🔗 bebzol whom should I contact to do it?
19:10 🔗 garyrh probably yipdw or chfoo
19:10 🔗 aaaaaaaaa I'd make the repo, test it with your own tracker and then let us know when it is done. Then the admins will take a look.
19:11 🔗 bebzol all right
19:15 🔗 aaaaaaaaa I think most of the admins are currently taking care of their day jobs.
19:24 🔗 yipdw ^
19:24 🔗 yipdw also if someone knows of a good way to trace leaks in Tomcat's fucking connection pool that would be awesome
19:24 🔗 yipdw logAbandoned property seems to do jack shit
19:25 🔗 midas best way to trace stuff in tomcat is to shoot it with a tank.
19:25 🔗 yipdw not an option
19:26 🔗 yipdw bebzol: shoot me your github username, I'll get the repo and permissions set up
19:27 🔗 bebzol it's "basement-labs"
19:27 🔗 bebzol thanks in advance
19:28 🔗 midas yipdw: are you using eclipse for tracing yet?
19:29 🔗 yipdw midas: IntelliJ, but I suspect I can do something similar
19:29 🔗 yipdw the production application isn't configured with remote debugging etc though
19:29 🔗 yipdw I guess I could turn that on
19:29 🔗 yipdw anyway #-bs
19:29 🔗 midas yeah lets move over there
19:29 🔗 yipdw bebzol: invitation emailed, repo online
19:29 🔗 bebzol thx :)
20:26 🔗 bebzol does anyone knows how to debug pipeline script? I get info that wget failed - but no further info
21:25 🔗 dserodio bebzol I've never tried, but I know Python. did you try adding a "-v" to wget_args (around line 216) ?
21:26 🔗 bebzol unfortunately - no wget output is printed - that is the problem
21:26 🔗 bebzol but I've already resolved my problem :)
21:28 🔗 ersi_ It's probably because the exit/return code wasn't as the pipeline wished
21:33 🔗 bebzol nah, I didn't set variables in python - item_type and item_value. I suppose this is important :P

irclogger-viewer