#archiveteam-bs 2015-04-02,Thu

↑back Search

Time Nickname Message
00:05 🔗 mistym has quit IRC (Quit: Leaving)
00:08 🔗 dashcloud joepie91_: my best guess is it's an exact copy of the one for sale on the ISO site, and so that's why the claim exists- not for the content, but because duplicates of final ISO standards are likely copies of the paid one
00:09 🔗 dashcloud as far as I know, drafts are fine to circulate, but final versions tend be treated as more off-limits
00:20 🔗 joepie91_ dashcloud: it's a "final draft"
00:20 🔗 joepie91_ so.. still a draft
00:22 🔗 dashcloud no idea then, unless standards people do something silly like consider that the finished version
00:34 🔗 primus105 has joined #archiveteam-bs
00:37 🔗 primus104 has quit IRC (Read error: Operation timed out)
00:54 🔗 primus has joined #archiveteam-bs
00:55 🔗 kyan What the... https://web.archive.org/web/20150402005504/http://textfiles.com/jason/ "Page cannot be crawled or displayed due to robots.txt."
00:58 🔗 aaaaaaaaa works here
01:00 🔗 SN4T14_ is now known as SN4T14
01:03 🔗 SketchCow DFJustin: You had an interesting effect
01:03 🔗 SketchCow Turns out that the archive.org crawlers, once given a robots.txt block, never checked that block again.
01:03 🔗 SketchCow That is about to change
01:04 🔗 kyan Ah, it's working now. Weirdness :P
01:09 🔗 mistym has joined #archiveteam-bs
01:40 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
01:46 🔗 dashcloud has joined #archiveteam-bs
02:32 🔗 primus105 has quit IRC (Leaving.)
03:07 🔗 mistym has quit IRC (Remote host closed the connection)
03:11 🔗 useretail has quit IRC (Dreaming in digital. Living in real-time. Thinking in binary. Talking in IP.)
03:22 🔗 Ara_ has quit IRC (Read error: Operation timed out)
03:25 🔗 Ara_ has joined #archiveteam-bs
03:32 🔗 Ara_ has quit IRC (Read error: Operation timed out)
03:34 🔗 Ara_ has joined #archiveteam-bs
03:35 🔗 mistym has joined #archiveteam-bs
03:36 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:41 🔗 Ara_ has quit IRC (Read error: Operation timed out)
03:43 🔗 Ara_ has joined #archiveteam-bs
03:43 🔗 dashcloud has joined #archiveteam-bs
03:59 🔗 Ara_ has quit IRC (Read error: Operation timed out)
04:04 🔗 Ara_ has joined #archiveteam-bs
04:09 🔗 Ara_ has quit IRC (Read error: Operation timed out)
04:15 🔗 Ara_ has joined #archiveteam-bs
04:15 🔗 aaaaaaaaa has quit IRC (Leaving)
04:43 🔗 DFJustin hmm that didn't used to be the case
04:53 🔗 Ara_ has quit IRC (Read error: Operation timed out)
04:55 🔗 Ara_ has joined #archiveteam-bs
05:08 🔗 Ara_ has quit IRC (Read error: Operation timed out)
05:14 🔗 Ara_ has joined #archiveteam-bs
05:23 🔗 Ara_ has quit IRC (Read error: Operation timed out)
05:27 🔗 Ara_ has joined #archiveteam-bs
05:36 🔗 mistym has quit IRC (Remote host closed the connection)
05:38 🔗 mistym has joined #archiveteam-bs
05:45 🔗 Ara_ has quit IRC (Read error: Operation timed out)
05:47 🔗 Ara_ has joined #archiveteam-bs
06:06 🔗 Ara_ has quit IRC (Read error: Connection reset by peer)
06:08 🔗 Ara_ has joined #archiveteam-bs
06:14 🔗 arkhive has joined #archiveteam-bs
06:16 🔗 arkhive SketchCow: Does IA do free prepaid postage to their hq? I have those floppy disks. like two thousand and it will cost a lot to ship. Do they do that? Apologies if idiotic lol
06:18 🔗 Ara_ has quit IRC (Read error: Operation timed out)
06:18 🔗 arkhive Or is there someone heading to San Fransisco that is passing through Colorado that can take them there for me? Let me know please. Thanks. :) and i gotta go to bed. right now it is 12:18am. goodnight. but i'll check mIRC when i wake up
06:19 🔗 Ara_ has joined #archiveteam-bs
06:21 🔗 kyan I don't know if this is helpful to anyone, but this is the solution I came up with to the problem of keeping an frequently updated directory tree in IA. :) Although, I haven't really tested it much, so it remains to be seen how it holds up under use. Anyway, hopefully this answers my question for anyone else who has it :) https://fracture-active.googlecode.com/svn/trunk/More/Patche/patche
06:28 🔗 mistym has quit IRC (Remote host closed the connection)
06:29 🔗 mistym has joined #archiveteam-bs
06:31 🔗 joepie91_ kyan: goddamn
06:31 🔗 joepie91_ that could use some more newlines :)
06:36 🔗 Ara_ has quit IRC (Read error: Operation timed out)
06:38 🔗 Ara_ has joined #archiveteam-bs
06:44 🔗 SketchCow arkhive: You should be sending them to me.
06:44 🔗 SketchCow In NY
06:48 🔗 mistym has quit IRC (Remote host closed the connection)
07:13 🔗 schbirid has joined #archiveteam-bs
08:01 🔗 primus104 has joined #archiveteam-bs
08:08 🔗 Ara_ has quit IRC (Read error: Operation timed out)
08:09 🔗 Ara_ has joined #archiveteam-bs
08:20 🔗 Ara_ has quit IRC (Read error: Operation timed out)
08:21 🔗 Ara_ has joined #archiveteam-bs
08:32 🔗 Ara_ has quit IRC (Read error: Operation timed out)
08:33 🔗 Ara_ has joined #archiveteam-bs
08:45 🔗 primus104 has quit IRC (Leaving.)
08:53 🔗 rejon has quit IRC (Remote host closed the connection)
08:53 🔗 rejon has joined #archiveteam-bs
09:09 🔗 Ara_ has quit IRC (Read error: Connection reset by peer)
09:09 🔗 Ara_ has joined #archiveteam-bs
09:27 🔗 Ara_ has quit IRC (Read error: Operation timed out)
09:28 🔗 Ara_ has joined #archiveteam-bs
09:30 🔗 brayden has quit IRC (Quit: Leaving)
09:37 🔗 primus104 has joined #archiveteam-bs
09:57 🔗 Ara_ has quit IRC (Read error: Operation timed out)
09:59 🔗 Ara_ has joined #archiveteam-bs
10:07 🔗 Ara_ has quit IRC (Read error: Operation timed out)
10:08 🔗 Ara_ has joined #archiveteam-bs
10:32 🔗 Ara_ has quit IRC (Read error: Operation timed out)
10:36 🔗 Ara_ has joined #archiveteam-bs
10:49 🔗 Ara_ has quit IRC (Read error: Connection reset by peer)
10:52 🔗 Ara_ has joined #archiveteam-bs
11:31 🔗 Ara__ has joined #archiveteam-bs
11:38 🔗 Ara_ has quit IRC (Ping timeout: 492 seconds)
11:44 🔗 Ara__ has quit IRC (Read error: Operation timed out)
11:46 🔗 Ara__ has joined #archiveteam-bs
11:48 🔗 brayden has joined #archiveteam-bs
12:01 🔗 brayden has quit IRC (Read error: Connection reset by peer)
12:06 🔗 brayden has joined #archiveteam-bs
12:09 🔗 Ara__ has quit IRC (Read error: Connection reset by peer)
12:10 🔗 Ara__ has joined #archiveteam-bs
12:19 🔗 Ara__ has quit IRC (Read error: Operation timed out)
12:20 🔗 Ara__ has joined #archiveteam-bs
12:22 🔗 midas SketchCow: your morning commute must be horrible if you drive from to SF
12:23 🔗 midas from NY*
12:29 🔗 Ara__ has quit IRC (Read error: Operation timed out)
12:30 🔗 Ara__ has joined #archiveteam-bs
12:41 🔗 primus104 has quit IRC (Leaving.)
13:07 🔗 Ara__ has quit IRC (Read error: Operation timed out)
13:09 🔗 Ara__ has joined #archiveteam-bs
13:11 🔗 Ara__ has quit IRC (Read error: Connection reset by peer)
13:12 🔗 Ara__ has joined #archiveteam-bs
13:23 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:26 🔗 Ara__ has quit IRC (Read error: Operation timed out)
13:27 🔗 Ara__ has joined #archiveteam-bs
13:42 🔗 Ara__ has quit IRC (Read error: Operation timed out)
13:46 🔗 Kazzy https://twitter.com/digitalocean/status/583625072319004673
13:47 🔗 Ara__ has joined #archiveteam-bs
13:47 🔗 Kazzy aaand the account is protected..
13:57 🔗 Ara__ has quit IRC (Read error: Operation timed out)
13:58 🔗 Ara__ has joined #archiveteam-bs
14:10 🔗 Ara__ has quit IRC (Read error: Operation timed out)
14:12 🔗 Ara__ has joined #archiveteam-bs
14:21 🔗 Ara__ has quit IRC (Read error: Operation timed out)
14:24 🔗 Ara_ has joined #archiveteam-bs
14:32 🔗 mistym has joined #archiveteam-bs
14:34 🔗 mistym has quit IRC (Remote host closed the connection)
14:45 🔗 Rotab Kazzy: huh?
14:45 🔗 Ara_ has quit IRC (Read error: Operation timed out)
14:46 🔗 Ara_ has joined #archiveteam-bs
14:47 🔗 primus104 has joined #archiveteam-bs
14:48 🔗 SketchCow My commute is nightmarish but there's this awesome diner in Illinois I stop at on the way
14:51 🔗 mistym has joined #archiveteam-bs
14:53 🔗 midas lol, just a 40 hour drive ;)
14:57 🔗 Ara_ has quit IRC (Read error: Operation timed out)
14:59 🔗 primus104 has quit IRC (Leaving.)
14:59 🔗 Ara_ has joined #archiveteam-bs
15:10 🔗 Kazzy Rotab: looked like someone had got into their acc.. turns out it was an employee having some fun
15:18 🔗 Ara_ has quit IRC (Read error: Operation timed out)
15:19 🔗 Ara_ has joined #archiveteam-bs
15:30 🔗 Ara_ has quit IRC (Read error: Operation timed out)
15:31 🔗 Ara_ has joined #archiveteam-bs
15:39 🔗 Ara_ has quit IRC (Read error: Operation timed out)
15:40 🔗 Ara_ has joined #archiveteam-bs
15:44 🔗 mistym has quit IRC (Remote host closed the connection)
15:52 🔗 Ara_ has quit IRC (Read error: Operation timed out)
15:53 🔗 Ara_ has joined #archiveteam-bs
15:58 🔗 mistym has joined #archiveteam-bs
16:28 🔗 aaaaaaaaa has joined #archiveteam-bs
16:53 🔗 bzc6p has joined #archiveteam-bs
16:53 🔗 bzc6p Hello
16:53 🔗 bzc6p I have a dilemma.
16:53 🔗 bzc6p I'm scraping an image sharing service. There are so called private pictures, that means, they don't appear in the image browser, but knowing the id, typing that in the url anyone can access it.
16:53 🔗 bzc6p I've seen a couple of image sharing services, and normally this id is difficult enough that it's impossible to find out with brute force.
16:53 🔗 bzc6p But in this particular case the id is an incremential number. So if I go from 1 to infinity, I get everything, including "private" pictures.
16:53 🔗 bzc6p I'd like to know what you'd do if you were me. Shall I bother doing a discovery of the browser pages (not that difficult, just time and some work), or shall I "preserve" everything?
16:54 🔗 bzc6p Thanks for your input.
16:54 🔗 primus104 has joined #archiveteam-bs
16:54 🔗 SketchCow 1. I grab them all.
16:54 🔗 SketchCow 2. If you feel bad, you should inform the vendor as it's a security bug.
16:54 🔗 SketchCow 3. Don't feel bad
16:55 🔗 SketchCow 4. Damn, picture ID 304A3B - that's now how you use a toothbrush
16:56 🔗 schbirid bzc6p: maybe there is also metadata available that says if a pic is "meant to be private" or not
16:56 🔗 bzc6p schbirid: I know what are private and what not. If I do a discovery, what are listed on the browser pages, are public, and only those are saved if I care.
16:57 🔗 bzc6p Question is, should I care?
16:57 🔗 schbirid dunno
16:57 🔗 schbirid your own decision :)
16:57 🔗 schbirid i'd only grab the public stuff myself or keep the private stuff for myself, advertising that i can provide them with reasonable proof (how i try to with Fileplanet)
16:58 🔗 godane http://archive.org/details/www.yelp.com-biz-memories-pizza-walkerton-20150402
16:58 🔗 bzc6p SketchCow: the system is built on a bad conception.
16:58 🔗 godane i grabbed all the comments pages and images from there
16:59 🔗 SketchCow I didn't say the problem could be easily fixed.
16:59 🔗 arkhive SketchCow: okay. I'll have to get some money together. that is a lot of money for me lol. :)
16:59 🔗 bzc6p There was another service, wchich listed every pictures, including private ones, on a page left rw-r--r--, and Google listed it. In that case I informed the admin and he hid the page. Well, this couldn't be done here.
16:59 🔗 SketchCow You're basically asking the standard "disclosure" question.
17:00 🔗 Smiley bzc6p: is it google+ photos?
17:00 🔗 Smiley If you can figure out the sequence, you can get all of those too.
17:01 🔗 bzc6p Well, the links must be somewhere in deep chatlogs. And one dayin 2025, there will be some which the poster would like to see, others not. But it's the same with public pictures.
17:02 🔗 Smiley lol
17:02 🔗 Smiley chatlogs are hardly private
17:02 🔗 Smiley posters are stupid
17:02 🔗 Smiley grab everything \o/
17:02 🔗 bzc6p On the other hand, they still remain private, except for those people who search for and find the warc containing them
17:02 🔗 bzc6p which I suppose is almost nobody. And, not knowing the person, almost no pictures can be embarassing.
17:03 🔗 bzc6p Smiley: no, its not G+
17:03 🔗 Smiley shame
17:03 🔗 Smiley have you found any interesting pictures?
17:04 🔗 schbirid /interesting/
17:04 🔗 bzc6p I don't spend much time doing that, but a few I saw were not embarassing.
17:04 🔗 schbirid sorry if i asked before, but what can i use to unpack a warc.gz into the individual files again?
17:04 🔗 bzc6p male and female naked pictures, there are a lot public. and not only porn frames, but homemade pictures too.
17:06 🔗 bzc6p So 3-1 in favor of grab everything, including me preferring that
17:06 🔗 bzc6p so far
17:08 🔗 * bzc6p comes back soon to read more opinions if any
17:09 🔗 yipdw schbirid: warcat extract, https://pypi.python.org/pypi/Warcat/
17:11 🔗 yipdw bzc6p: in this particular case I'd be in favor of not grabbing the private ones
17:11 🔗 schbirid thx
17:11 🔗 yipdw the "lol stupid users don't know an auto-incrementing ID is guessable" excuse is not something I enjoy
17:11 🔗 yipdw that said another possibility is to keep a record of which are private and dark them in IA
17:36 🔗 bzc6p Well, there is another point that supports yipdw.
17:36 🔗 bzc6p Do we save content for the uploader, or for internet audience? I think we work for the latter.
17:37 🔗 bzc6p And for them, only public things matter. For those revirewing their own chatlogs and not finding their stuff int 2025 - well, why did you rely on the clown?
17:38 🔗 bzc6p Putting things on free internet services and backing up things is two entirely different concepts.
17:39 🔗 bzc6p (Using correct grammar is a third one.)
18:05 🔗 bzc6p Well, considering that I didn't save private pictures in other cases, and also considering the things above, and in lack of consensus, I think I won't save private ones.
18:05 🔗 bzc6p I do rather spend my resources on other sources' public things.
18:06 🔗 xmc i'd suggest iterating over all the urls and saving them all, but if you can easily determine which is which, sort the private ones out into a different item
18:06 🔗 bzc6p (although lot of "public" pictures are meant to be private, but one can't differentiate them, everything must be saved)
18:09 🔗 bzc6p xmc: I see your concept. But I can hardly imagine a situation when that's necessary. We talk about random people's random pictures, possibly no important ones.
18:09 🔗 bzc6p I mean, photo of plane landing on Hudson river wouldn't be stored ONLY at these places.
18:10 🔗 bzc6p And I can't imagine Samantha Doe writing to info@archive.org that "Dear Sir or Madam, I see some pictures of example.com are stored, but don't you eventualy have picture ID 234567 of my selfie with duckface from 2010?
18:11 🔗 bzc6p (Sorry, selfies with duckfaces were probably not common in 2010, but you'll get it.)
18:13 🔗 xmc we've had people come into the channel in a blind panic asking for their otherwise-deleted pictures of their grandson
18:13 🔗 xmc it's not inconceivable
18:13 🔗 xmc and it has the possibility to make that person very happy
18:14 🔗 xmc but if you want to not save it, i guess that's your decision?
18:14 🔗 xmc what's the numbers on these anyway
18:14 🔗 bzc6p hm.
18:16 🔗 bzc6p Well, once I make a discovery, it's me who needs to make the least effort to sort the pictures as public and as not.
18:16 🔗 bzc6p So it makes sense.
18:17 🔗 xmc i don't understand what you just said, could you use more words please
18:18 🔗 bzc6p Once I have the list of public picture IDs, it's easy to have the private ones. So I *should*, at least, keep a record of them.
18:18 🔗 bzc6p That's what I meant, in support of your last argument.
18:18 🔗 xmc oh
18:18 🔗 xmc yeah
18:18 🔗 xmc i'm not arguing. i just told you what i would do.
18:19 🔗 bzc6p I think I used the wrong word "argument", I meant...
18:19 🔗 xmc if you do a thing that i think is wrong, i will have a sad and then we will talk
18:19 🔗 * bzc6p takes the dictionary
18:19 🔗 xmc otherwise what's good is good
18:20 🔗 xmc what site are you working on anyway
18:22 🔗 yipdw partition isPrivate would have been easy to do by now :P
18:22 🔗 xmc grab it all, sort into separate private and public megawarcs, dark as appropriate
18:22 🔗 xmc done
18:22 🔗 bzc6p argument (noun) [...] 4. a statement, reason, or fact for or against a point
18:22 🔗 bzc6p yours was a fact and a reason at least.
18:22 🔗 yipdw also if someone can tell me why the hell krunner in KDE 4 keeps locking up that'd be awesome
18:22 🔗 xmc ok
18:23 🔗 bzc6p xmc: I just thought I'll do so.
18:23 🔗 xmc anyway bzc6p i've told you what i prefer and why, now it's your turn to make a decision
18:23 🔗 bzc6p I appreciate your input.
18:23 🔗 xmc if you want more dictionary fun, look up "analysis paralysis", and then press pageup a few times in irc :)
18:23 🔗 xmc cool
18:24 🔗 bzc6p I didnt know anyone ever came into here in blind panic for a photo
18:25 🔗 yipdw not here specifically, but Tabblo etc. have shown that people do look for their things
18:31 🔗 bzc6p Well, then, I'll make a discovery, sort items to private and public, and upload them to items accordingly, and after that I may tell SketchCo w to darken them.
18:33 🔗 bzc6p And now I'll abort the just started "universal" process. No problem, as I found a more important project to do, now I can deal with that.
18:33 🔗 bzc6p Thank you everyone for the debate.
18:34 🔗 primus104 has quit IRC (Leaving.)
18:41 🔗 schbirid yipdw: does warcat extract to the dir my shell is currently in or relative to the warc?
18:43 🔗 schbirid nvm, i decided to live dangerous
18:43 🔗 schbirid its the current dir :)
18:59 🔗 mistym has quit IRC (Quit: Leaving)
19:00 🔗 rolf has joined #archiveteam-bs
19:12 🔗 godane i'm grabbing pypi.python.org sources
19:12 🔗 mistym has joined #archiveteam-bs
19:14 🔗 schbirid hngg, the files wget saved vs what comes out of the warc seems quite different
19:14 🔗 schbirid just looking at the list of files atm and trying to remember what i did
19:14 🔗 * ersi pours schbirid some liqour
19:15 🔗 schbirid nono, i am currently hacking into my new kobo and must nto screw up
19:15 🔗 schbirid wtf http://www.mobileread.com/forums/showthread.php?t=162713 !
19:16 🔗 * joepie91_ adds Kobo to no-buy list
19:18 🔗 schbirid nah its kinda fine, it runs linux
19:21 🔗 midas 'it runs linux' seems to be the safeword for tracking. it shouldnt be there from the start, running linux or not
19:23 🔗 schbirid well, i have had my share of openpandora so i am not that keen about open hardware anymore
19:24 🔗 schbirid although that wasnt even open
19:24 🔗 midas ok this is funky
19:24 🔗 midas i just rebooted my domotica system.
19:24 🔗 midas and my fileserver
19:24 🔗 midas did not realize it was running on the same box...
19:24 🔗 midas with root credentials
19:27 🔗 godane so looks like i can grab UN videos with youtube-dl
19:27 🔗 primus104 has joined #archiveteam-bs
19:30 🔗 schbirid domotica sounds kinky
19:30 🔗 bzc6p has quit IRC (Read error: Operation timed out)
19:32 🔗 bzc6p has joined #archiveteam-bs
19:37 🔗 Smiley has quit IRC (http://www.milkme.co.uk - You'll never understand.)
19:38 🔗 midas schbirid: it kinda is
19:38 🔗 midas all my lights/heating/power is controlled from that box
19:39 🔗 bzc6p has left
19:47 🔗 mistym has quit IRC (Remote host closed the connection)
19:47 🔗 schbirid fuck me, i cannot figure out how to actually add books to it
19:48 🔗 midas copy pasta?
19:48 🔗 SN4T14_ has joined #archiveteam-bs
19:49 🔗 schbirid i dont wanna use fucking calibre :(
19:57 🔗 SN4T14 has quit IRC (Ping timeout: 512 seconds)
19:58 🔗 godane so i found old UN real media streams
19:59 🔗 godane old i mean 2010 i think
19:59 🔗 useretail has joined #archiveteam-bs
20:01 🔗 schbirid i just had to replug usb
20:01 🔗 schbirid its simply mass storage now
20:01 🔗 godane also good news is i think i'm downloading it alot faster then the korea stuff
20:02 🔗 mistym has joined #archiveteam-bs
20:15 🔗 godane so i'm getting 10 to 15 min videos in under a 60 seconds
20:15 🔗 godane from the UN
20:17 🔗 Smiley has joined #archiveteam-bs
20:34 🔗 rolf has quit IRC (Leaving...)
20:35 🔗 schbirid https://www.youtube.com/watch?v=n4Bcl1EeenM
21:10 🔗 schbirid has quit IRC (Leaving)
21:23 🔗 Smiley has quit IRC (HE'S BACCCCCK)
21:24 🔗 Smiley has joined #archiveteam-bs
21:35 🔗 jk[SVP] has quit IRC (Ping timeout: 240 seconds)
21:36 🔗 jk[SVP] has joined #archiveteam-bs
21:38 🔗 NotGLaDOS has quit IRC (Ping timeout: 240 seconds)
21:38 🔗 twrist has joined #archiveteam-bs
21:57 🔗 BlueMaxim has joined #archiveteam-bs
22:21 🔗 wtron has joined #archiveteam-bs
23:04 🔗 wtron has left

irclogger-viewer